mirror of
https://github.com/hermitcore/libhermit.git
synced 2025-03-09 00:00:03 +01:00
138 lines
4.6 KiB
Markdown
138 lines
4.6 KiB
Markdown
# Profiling with XRay
|
|
|
|
## Introduction
|
|
|
|
You can profile your application and parts of the system runtime using the XRay
|
|
profiler. It hooks into every function call (using GCC's
|
|
`-finstrument-functions` option) to record the execution time and create a call
|
|
graph.
|
|
|
|
It can generate a text-file report that lists the most expensive function calls
|
|
depending on the filtering that is configured.
|
|
|
|
|
|
## About XRay
|
|
|
|
XRay can divide the profiling into multiple "runs" called frames. In a graphical
|
|
application this could correspond to the rendering of a graphics frame, whereas
|
|
in a benchmark application a frame might correspond to each individual benchmark
|
|
run.
|
|
|
|
The profiling information is saved in a statically sized ring buffer so you must
|
|
decide on the size of the buffer and the max. number of frames. Those values
|
|
might need some fine tuning. If in doubt, increase the buffer size.
|
|
|
|
In order for XRay to resolve function names, a linker map file is needed. Using
|
|
this file, addresses can be resolved to function names.
|
|
|
|
|
|
## Limitations
|
|
|
|
If the compiler aggressively (or intendedly) inlines functions you won't see
|
|
them in the final report since no enter and exit hooks are inserted. Keep this
|
|
is mind if there's some function missing in the call hierarchy. Furthermore, the
|
|
name of static functions cannot be resolved because their names are not listed
|
|
in the linker file.
|
|
|
|
|
|
## Profile your application
|
|
|
|
To generate linker map files and inject enter and exit hooks, you have to tell
|
|
CMake that you want your application to be profiled:
|
|
|
|
```bash
|
|
$ cd build
|
|
$ cmake .. -DPROFILING=true
|
|
```
|
|
|
|
If you want to profile HermitCore internals or one of the example applications,
|
|
have a look at `CMakeLists.txt` in the root of the repository. Every target that
|
|
is built by `build_external(target_name ...)` can be profiled like this:
|
|
|
|
```bash
|
|
$ cd build
|
|
$ cmake .. -DPROFILE_APPS='openmpbench;tests'
|
|
```
|
|
|
|
### Code
|
|
|
|
You have to include the XRay header: `#include <xray.h>`.
|
|
|
|
Then you must initialize XRay and already do some configuration:
|
|
|
|
```c
|
|
struct XRayTraceCapture* XRayInit(int stack_size,
|
|
int buffer_size,
|
|
int frame_count,
|
|
const char* mapfilename);
|
|
|
|
struct XRayTraceCapture* trace = XRayInit(
|
|
5, // max. call depth in report
|
|
4 * 1000 * 1000, // ring buffer size for profiling information
|
|
10, // frame count
|
|
"/path/to/your/application.map");
|
|
```
|
|
|
|
To find the hotspots in your code you might want to start with a relatively
|
|
small call depth (maybe 5) and increase it to gain a better understanding of the
|
|
detailed call hierarchy. The maximum call depth / stack size is 255. Keep the
|
|
buffer size as small as possible and increase on demand.
|
|
|
|
Now you can wrap parts of your code into frames:
|
|
|
|
```c
|
|
XRayStartFrame(trace);
|
|
do_work();
|
|
XRayEndFrame(trace);
|
|
|
|
XRayStartFrame(trace);
|
|
do_even_more_work();
|
|
XRayEndFrame(trace);
|
|
```
|
|
|
|
And finally generate the report:
|
|
|
|
```c
|
|
void XRaySaveReport(struct XRayTraceCapture* capture,
|
|
const char* filename,
|
|
float percent_cutoff,
|
|
int cycle_cutoff);
|
|
|
|
XRaySaveReport(trace,
|
|
"/path/to/you/report/application.xray", // report file
|
|
10.0f, // Only output funcs that have a higher runtime [%]
|
|
2000); // Only output funcs that have a higher runtime [cycles]
|
|
XRayShutdown(trace);
|
|
```
|
|
|
|
Here you can do further filtering of the output. For a function call to be added
|
|
to the report, it's relative runtime (whole application) has be higher than
|
|
`percent_cutoff` and it's absolute runtime must be greater than `cycle_cutoff`
|
|
CPU cycles.
|
|
|
|
|
|
## Example
|
|
|
|
See [usr/openmpbench/syncbench.c](https://github.com/RWTH-OS/HermitCore/blob/master/usr/openmpbench/syncbench.c).
|
|
|
|
|
|
## Analysis
|
|
|
|
After tracing your code, you may want to analyse the report. While the XRay
|
|
report is already human-readable, it's hard to get an overview of the whole
|
|
trace. Therefore, it's possible to convert the XRay report to a format that
|
|
[kCacheGrind](https://kcachegrind.github.io) can read. You can find the tool
|
|
needed for conversion at `usr/xray/tools`.
|
|
|
|
```bash
|
|
$ ./conv2kcg.py libgomp_trace.xray
|
|
INFO:Parsing Header is done. Found 1 frames
|
|
INFO:Found frame 'PARALLEL' data
|
|
INFO:Frame 'PARALLEL' complete
|
|
INFO:Report file 'libgomp_trace.xray' parsed completely.
|
|
INFO:Create callgrind file for frame 'PARALLEL'
|
|
INFO:Writing to: libgomp_trace_PARALLEL.callgrind
|
|
```
|
|
|
|
This will create the file `libgomp_trace_PARALLEL.callgrind` which can be opened
|
|
using kCacheGrind (Open dialog: set Filter to 'All Files').
|