1
0
Fork 0
mirror of https://github.com/hermitcore/libhermit.git synced 2025-03-09 00:00:03 +01:00
libhermit/usr/xray/README.md
2017-04-03 18:14:56 +02:00

138 lines
4.6 KiB
Markdown

# Profiling with XRay
## Introduction
You can profile your application and parts of the system runtime using the XRay
profiler. It hooks into every function call (using GCC's
`-finstrument-functions` option) to record the execution time and create a call
graph.
It can generate a text-file report that lists the most expensive function calls
depending on the filtering that is configured.
## About XRay
XRay can divide the profiling into multiple "runs" called frames. In a graphical
application this could correspond to the rendering of a graphics frame, whereas
in a benchmark application a frame might correspond to each individual benchmark
run.
The profiling information is saved in a statically sized ring buffer so you must
decide on the size of the buffer and the max. number of frames. Those values
might need some fine tuning. If in doubt, increase the buffer size.
In order for XRay to resolve function names, a linker map file is needed. Using
this file, addresses can be resolved to function names.
## Limitations
If the compiler aggressively (or intendedly) inlines functions you won't see
them in the final report since no enter and exit hooks are inserted. Keep this
is mind if there's some function missing in the call hierarchy. Furthermore, the
name of static functions cannot be resolved because their names are not listed
in the linker file.
## Profile your application
To generate linker map files and inject enter and exit hooks, you have to tell
CMake that you want your application to be profiled:
```bash
$ cd build
$ cmake .. -DPROFILING=true
```
If you want to profile HermitCore internals or one of the example applications,
have a look at `CMakeLists.txt` in the root of the repository. Every target that
is built by `build_external(target_name ...)` can be profiled like this:
```bash
$ cd build
$ cmake .. -DPROFILE_APPS='openmpbench;tests'
```
### Code
You have to include the XRay header: `#include <xray.h>`.
Then you must initialize XRay and already do some configuration:
```c
struct XRayTraceCapture* XRayInit(int stack_size,
int buffer_size,
int frame_count,
const char* mapfilename);
struct XRayTraceCapture* trace = XRayInit(
5, // max. call depth in report
4 * 1000 * 1000, // ring buffer size for profiling information
10, // frame count
"/path/to/your/application.map");
```
To find the hotspots in your code you might want to start with a relatively
small call depth (maybe 5) and increase it to gain a better understanding of the
detailed call hierarchy. The maximum call depth / stack size is 255. Keep the
buffer size as small as possible and increase on demand.
Now you can wrap parts of your code into frames:
```c
XRayStartFrame(trace);
do_work();
XRayEndFrame(trace);
XRayStartFrame(trace);
do_even_more_work();
XRayEndFrame(trace);
```
And finally generate the report:
```c
void XRaySaveReport(struct XRayTraceCapture* capture,
const char* filename,
float percent_cutoff,
int cycle_cutoff);
XRaySaveReport(trace,
"/path/to/you/report/application.xray", // report file
10.0f, // Only output funcs that have a higher runtime [%]
2000); // Only output funcs that have a higher runtime [cycles]
XRayShutdown(trace);
```
Here you can do further filtering of the output. For a function call to be added
to the report, it's relative runtime (whole application) has be higher than
`percent_cutoff` and it's absolute runtime must be greater than `cycle_cutoff`
CPU cycles.
## Example
See [usr/openmpbench/syncbench.c](https://github.com/RWTH-OS/HermitCore/blob/master/usr/openmpbench/syncbench.c).
## Analysis
After tracing your code, you may want to analyse the report. While the XRay
report is already human-readable, it's hard to get an overview of the whole
trace. Therefore, it's possible to convert the XRay report to a format that
[kCacheGrind](https://kcachegrind.github.io) can read. You can find the tool
needed for conversion at `usr/xray/tools`.
```bash
$ ./conv2kcg.py libgomp_trace.xray
INFO:Parsing Header is done. Found 1 frames
INFO:Found frame 'PARALLEL' data
INFO:Frame 'PARALLEL' complete
INFO:Report file 'libgomp_trace.xray' parsed completely.
INFO:Create callgrind file for frame 'PARALLEL'
INFO:Writing to: libgomp_trace_PARALLEL.callgrind
```
This will create the file `libgomp_trace_PARALLEL.callgrind` which can be opened
using kCacheGrind (Open dialog: set Filter to 'All Files').