NOTE – Optimizing IR
The runtime.c file is not instrumented because the pass checks that the special functions are not yet declared in a module.
This already looks better, but does it scale to larger programs? Let’s assume you want to build an instrumented binary of the tinylang compiler for Chapter 5. How would you do this?
You can pass compiler and linker flags on the CMake command line, which is exactly what we need. The flags for the C++ compiler are given in the CMAKE_CXX_FLAGS variable. Thus, specifying the following on the CMake command line adds the new pass to all compiler runs:
-DCMAKE_CXX_FLAGS=”-fpass-plugin=/PPProfiler.so”
Please replace with the absolute path to the shared library.
Similarly, specifying the following adds the runtime.o file to each linker invocation. Again, please replace with the absolute path to a compiled version of runtime.c:
-DCMAKE_EXE_LINKER_FLAGS=”/runtime.o”
Of course, this requires clang as the build compiler. The fastest way to make sure clang is used as the build compiler is to set the CC and CXX environment variables accordingly:
export CC=clang
export CXX=clang++
With these additional options, the CMake configuration from Chapter 5 should run as usual.
After building the tinylang executable, you can run it with the example Gcd.mod file. The ppprofile.csv file will also be written, this time with more than 44,000 lines!
Of course, having such a dataset raises the question of if you can get something useful out of it. For example, getting a list of the 10 most often called functions, together with the call count and the time spent in the function, would be useful information. Luckily, on a Unix system, you have a couple of tools that can help. Let’s build a short pipeline that matches enter events with exit events, counts the functions, and displays the top 10 functions. The awk Unix tool helps with most of these steps.
To match an enter event with an exit event, the enter event must be stored in the record associative map. When an exit event is matched, the stored enter event is looked up, and the new record is written. The emitted line contains the timestamp from the enter event, the timestamp from the exit event, and the difference between both. We must put this into the join.awk file:
BEGIN { FS = “|”; OFS = “|” }
/enter/ { record[$2] = $0 }
/exit/ { split(record[$2],val,”|”)
print val[2], val[3], $3, $3-val[3], val[4] }
To count the function calls and the execution, two associative maps, count and sum, are used. In count, the function calls are counted, while in sum, the execution time is added. In the end, the maps are dumped. You can put this into the avg.awk file:
BEGIN { FS = “|”; count[“”] = 0; sum[“”] = 0 }
{ count[$1]++; sum[$1] += $4 }
END { for (i in count) {
if (i != “”) {
print count[i], sum[i], sum[i]/count[i], I }
} }
After running these two scripts, the result can be sorted in descending order, and then the top 10 lines can be taken from the file. However, we can still improve the function names, __ppp_enter() and __ppp_exit(), which are mangled and are therefore difficult to read. Using the llvm-cxxfilt tool, the names can be demangled. The demangle.awk script is as follows:
{ cmd = “llvm-cxxfilt ” $4
(cmd) | getline name
close(cmd); $4 = name; print }
To get the top 10 function calls, you can run the following:
$ cat ppprofile.csv | awk -f join.awk | awk -f avg.awk |\
sort -nr | head -15 | awk -f demangle.awk
Here are some sample lines from the output:
446 1545581 3465.43 charinfo::isASCII(char)
409 826261 2020.2 llvm::StringRef::StringRef()
382 899471 2354.64
tinylang::Token::is(tinylang::tok::TokenKind) const
171 1561532 9131.77 charinfo::isIdentifierHead(char)
The first number is the call count of the function, the second is the cumulated execution time, and the third number is the average execution time. As explained previously, do not trust the time values, though the call counts should be accurate.
So far, we’ve implemented a new instrumentation pass, either as a plugin or as an addition to LLVM, and we used it in some real-world scenarios. In the next section, we’ll explore how to set up an optimization pipeline in our compiler.
Leave a Reply