mirror of
https://github.com/hermitcore/libhermit.git
synced 2025-03-09 00:00:03 +01:00
First version of XRay howto
parent
8ae0b42cbb
commit
7bcc2e0126
1 changed files with 87 additions and 0 deletions
87
Profiling-using-XRay.md
Normal file
87
Profiling-using-XRay.md
Normal file
|
@ -0,0 +1,87 @@
|
|||
# Introduction
|
||||
|
||||
You can profile your application and parts of the system runtime using the XRay profiler. It hooks into every function call (using GCC's `-finstrument-functions` option) to record the execution time and create a call graph.
|
||||
|
||||
It can generate a text-file report that lists the most expensive function calls depending on the filtering that is configured.
|
||||
|
||||
|
||||
# About XRay
|
||||
|
||||
XRay can divide the profiling into multiple "runs" called frames. In a graphical application this could correspond to the rendering of a graphics frame, whereas in a benchmark application a frame might correspond to each individual benchmark run.
|
||||
|
||||
The profiling information is saved in a statically sized ring buffer so you must decide on the size of the buffer and the max. number of frames. Those values might need some fine tuning. If in doubt, increase the buffer size.
|
||||
|
||||
In order for XRay to resolve function names a linker map file is needed. Using this file, addresses can be resolved to function names.
|
||||
|
||||
|
||||
# Limitations
|
||||
|
||||
If the compiler aggressively (or intendedly) inlines functions you won't see them in the final report since no enter and exit hooks are inserted. Keep this is mind if there's some function missing in the call hierarchy. Furthermore, the name of static functions cannot be resolved because their names are not listed in the linker file.
|
||||
|
||||
... and probably more limitations that didn't come up yet :)
|
||||
|
||||
|
||||
# Profile your application
|
||||
|
||||
## Building and linking
|
||||
|
||||
First you must generate a map file for your application. You can do so by adding `-Wl,-Map=application.map` to your linker command if your using `gcc` for linking (omit the `-Wl,` if linking directly with `ld`).
|
||||
|
||||
The flags that need to be added in order to enable profiling are defined in `/hermit/Makefile`: `PROFILING_CFLAGS` and `PROFILING_LDFLAGS`. You should propagate them through the build system into your application, but copying them for a quick start shouldn't hurt.
|
||||
|
||||
To build everything with profiling enabled just set the environment variable `PROFILING` like this:
|
||||
```bash
|
||||
$ cd HermitCore
|
||||
$ make clean
|
||||
$ make PROFILING=yes
|
||||
$ # more permanent alternative
|
||||
$ export PROFILING=yes
|
||||
$ make
|
||||
```
|
||||
|
||||
At the moment you cannot disable tracing of some parts of the runtime (pthreads, OpenMP) but as a workaround you can build HermitCore just with `make`, clean your application and then invoke `make PROFILING=yes`. This should rebuild and link your application without enabling profiling inside the runtime.
|
||||
|
||||
## Code
|
||||
|
||||
You have to include the XRay header: `#include <xray.h>`.
|
||||
|
||||
Then you must initialize XRay and already do some configuration:
|
||||
```c
|
||||
struct XRayTraceCapture* XRayInit(int stack_size, int buffer_size, int frame_count, const char* mapfilename);
|
||||
|
||||
struct XRayTraceCapture* trace = XRayInit(
|
||||
5, // max. call depth in report
|
||||
4 * 1000 * 1000, // ring buffer size for profiling information
|
||||
10, // frame count
|
||||
"/path/to/your/application.map");
|
||||
```
|
||||
|
||||
To find the hotspots in your code you might want to start with a relatively small call depth (maybe 5) and increase it to gain a better understanding of the detailed call hierarchy. The maximum call depth / stack size is 255. Keep the buffer size as small as possible and increase on demand.
|
||||
|
||||
Now you can wrap parts of your code into frames:
|
||||
```c
|
||||
XRayStartFrame(trace);
|
||||
do_work();
|
||||
XRayEndFrame(trace);
|
||||
|
||||
XRayStartFrame(trace);
|
||||
do_even_more_work();
|
||||
XRayEndFrame(trace);
|
||||
```
|
||||
|
||||
And finally generate the report:
|
||||
```c
|
||||
void XRaySaveReport(struct XRayTraceCapture* capture, const char* filename, float percent_cutoff, int cycle_cutoff);
|
||||
|
||||
XRaySaveReport(trace,
|
||||
"/path/to/you/report/application.xray", // report file
|
||||
10.0f, // Only output funcs that have a higher runtime [%]
|
||||
2000); // Only output funcs that have a higher runtime [cycles]
|
||||
XRayShutdown(trace);
|
||||
```
|
||||
|
||||
Here you can do further filtering of the output. For a function call to be added to the report, it's relative runtime (whole application) has be higher than `percent_cutoff` and it's absolute runtime must be greater than `cycle_cutoff` CPU cycles.
|
||||
|
||||
# Example
|
||||
|
||||
See [hermit/usr/openmpbench/syncbench.c](https://github.com/RWTH-OS/HermitCore/blob/dae722ea8cab9d14ba3e8e4700310b5dcf20d8ef/hermit/usr/openmpbench/syncbench.c).
|
Loading…
Add table
Reference in a new issue