Archives 2022

Defining records and classes – The TableGen Language

Let’s define a simple record for an instruction:

def ADD {
string Mnemonic = “add”;
int Opcode = 0xA0;
}

The def keyword signals that you define a record. It is followed by the name of the record. The record body is surrounded by curly braces, and the body consists of field definitions, similar to a structure in C++.
You can use the llvm-tblgen tool to see the generated records. Save the preceding source code in an inst.td file and run the following:

$ llvm-tblgen –print-records inst.td
————- Classes —————–
————- Defs —————–
def ADD {
string Mnemonic = “add”;
int Opcode = 160;
}

This is not yet exciting; it only shows the defined record was parsed correctly.
Defining instructions using single records is not very comfortable. A modern CPU has hundreds of instructions, and with this amount of records, it is very easy to introduce typing errors in the field names. And if you decide to rename a field or add a new field, then the number of records to change becomes a challenge. Therefore, a blueprint is needed. In C++, classes have a similar purpose, and in TableGen, it is also called a class. Here is the definition of an Inst class and two records based on that class:

class Inst {
string Mnemonic = mnemonic;
int Opcode = opcode;
}
def ADD : Inst<“add”, 0xA0>;
def SUB : Inst<“sub”, 0xB0>;

The syntax for classes is similar to that of records. The class keyword signals that a class is defined, followed by the name of the class. A class can have a parameter list. Here, the Inst class has two parameters, mnemonic and opcode, which are used to initialize the records’ fields. The values for those fields are given when the class is instantiated. The ADD and SUB records show two instantiations of the class. Again, let’s use llvm-tblgen to look at the records:

$ llvm-tblgen –print-records inst.td
————- Classes —————–
class Inst {
string Mnemonic = Inst:mnemonic;
int Opcode = Inst:opcode;
}
————- Defs —————–
def ADD { // Inst
string Mnemonic = “add”;
int Opcode = 160;
}
def SUB { // Inst
string Mnemonic = “sub”;
int Opcode = 176;
}

Now, you have one class definition and two records. The name of the class used to define the records is shown as a comment. Please note that the arguments of the class have the default value ?, which indicates int is uninitialized.

Understanding the TableGen language – The TableGen Language

LLVM comes with its own domain-specific language (DSL) called TableGen. It is used to generate C++ code for a wide range of use cases, thus reducing the amount of code a developer has to produce. The TableGen language is not a full-fledged programming language. It is only used to define records, which is a fancy word for a collection of names and values. To understand why such a restricted language is useful, let’s examine two examples.

Typical data you need to define one machine instruction of a CPU is:

  • The mnemonic of the instruction
  • The bit pattern
  • The number and types of operands
  • Possible restrictions or side effects

It is easy to see that this data can be represented as a record. For example, a field named asmstring could hold the value of the mnemonic; say, “add”. Also, a field named opcode could hold the binary representation of the instruction. Together, the record would describe an additional instruction. Each LLVM backend describes the instruction set in this way.

Records are such a general concept that you can describe a wide variety of data with them. Another example is the definition of command-line options. A command-line option:

  • Has a name
  • May have an optional argument
  • Has a help text
  • May belong to a group of options

Again, this data can be easily seen as a record. Clang uses this approach for the command-line options of the Clang driver.

The TableGen language

In LLVM, the TableGen language is used for a variety of tasks. Large parts of a backend are written in the TableGen language; for example, the definition of a register file, all instructions with mnemonic and binary encoding, calling conventions, patterns for instruction selection, and scheduling models for instruction scheduling. Other uses of LLVM are the definition of intrinsic functions, the definition of attributes, and the definition of command-line options.

You’ll find the Programmer’s Reference at https://llvm.org/docs/TableGen/ProgRef.html and the Backend Developer’s Guide at https://llvm.org/docs/TableGen/BackGuide.html.

To achieve this flexibility, the parsing and the semantics of the TableGen language are implemented in a library. To generate C++ code from the records, you need to create a tool that takes the parsed records and generates C++ code from it. In LLVM, that tool is called llvm-tblgen, and in Clang, it is called clang-tblgen. Those tools contain the code generators required by the project. But they can also be used to learn more about the TableGen language, which is what we’ll do in the next section.

Experimenting with the TableGen language

Very often, beginners feel overwhelmed by the TableGen language. But as soon as you start experimenting with the language, it becomes much easier.

TIP – Optimizing IR

To allow the user to add passes at every extension point, you need to add the preceding code snippet for each extension point.

  1. Now is a good time to try out the different pass manager options. With the –debug-pass-manager option, you can follow which passes are executed in which order. You can also print the IR before or after each pass, which is invoked with the –print-before-all and –print-after-all options. If you created your own pass pipeline, then you can insert the print pass in points of interest. For example, try the –passes=”print,inline,print” option. Furthermore, to identify which pass changes the IR code, you can use the –print-changed option, which will only print the IR code if it has changed compared to the result from the pass before. The greatly reduced output makes it much easier to follow IR transformations.

The PassBuilder class has a nested OptimizationLevel class to represent the six different optimization levels. Instead of using the “default<O?>” pipeline description as an argument to the parsePassPipeline() method, we can also call the buildPerModuleDefaultPipeline() method, which builds the default optimization pipeline for the request level – except for level O0. This optimization level means that no optimization is performed.

Consequently, no passes are added to the pass manager. If we still want to run a certain pass, then we can add it to the pass manager manually. A simple pass to run at this level is the AlwaysInliner pass, which inlines a function marked with the always_inline attribute into the caller. After translating the command-line option value for the optimization level into the corresponding member of the OptimizationLevel class, we can implement this as follows:
    PassBuilder::OptimizationLevel Olevel = …;
    if (OLevel == PassBuilder::OptimizationLevel::O0)
      MPM.addPass(AlwaysInlinerPass());
    else
      MPM = PB.buildPerModuleDefaultPipeline(OLevel, DebugPM);

Of course, it is possible to add more than one pass to the pass manager in this fashion. PassBuilder also uses the addPass() method when constructing the pass pipeline.

Running extension point callbacks

Because the pass pipeline is not populated for optimization level O0, the registered extension points are not called. If you use the extension points to register passes that should also run at O0 level, this is problematic. You can call the runRegisteredEPCallbacks() method to run the registered extension point callbacks, resulting in a pass manager populated only with the passes that were registered through the extension points.

By adding the optimization pipeline to tinylang, you created an optimizing compiler similar to clang. The LLVM community works on improving the optimizations and the optimization pipeline with each release. Due to this, it is very seldom that the default pipeline is not used. Most often, new passes are added to implement certain semantics of the programming language.

Summary

In this chapter, you learned how to create a new pass for LLVM. You ran the pass using a pass pipeline description and an extension point. You extended your compiler with the construction and execution of a pass pipeline similar to clang, turning tinylang into an optimizing compiler. The pass pipeline allows the addition of passes at extension points, and you learned how you can register passes at these points. This allows you to extend the optimization pipeline with your developed passes or existing passes.

In the next chapter, you will learn the basics of the TableGen language, which is used extensively in LLVM and clang to significantly reduce manual programming.

Extending the pass pipeline – Optimizing IR

In the previous section, we used the PassBuilder class to create a pass pipeline, either from a user-provided description or a predefined name. Now, let’s look at another way to customize the pass pipeline: using extension points.
During the construction of the pass pipeline, the pass builder allows passes contributed by the user to be added. These places are called extension points. A couple of extension points exist, as follows:
• The pipeline start extension point, which allows us to add passes at the beginning of the pipeline
• The peephole extension point, which allows us to add passes after each instance of the instruction combiner pass
Other extension points exist too. To employ an extension point, you must register a callback. During the construction of the pass pipeline, your callback is run at the defined extension point and can add passes to the given pass manager.
To register a callback for the pipeline start extension point, you must call the registerPipelineStartEPCallback() method of the PassBuilder class. For example, to add our PPProfiler pass to the beginning of the pipeline, you would adapt the pass to be used as a module pass with a call to the createModuleToFunctionPassAdaptor() template function and then add the pass to the module pass manager:

PB.registerPipelineStartEPCallback(
[](ModulePassManager &MPM) {
MPM.addPass(PPProfilerIRPass());
});

You can add this snippet in the pass pipeline setup code anywhere before the pipeline is created – that is, before the parsePassPipeline() method is called.
A very natural extension to what we did in the previous section is to let the user pass a pipeline description for an extension point on the command line. The opt tool allows this too. Let’s do this for the pipeline start extension point. Add the following code to the tools/driver/Driver.cpp file:

  1. First, we must a new command line for the user to specify the pipeline description. Again, we take the option name from the opt tool:

static cl::opt PipelineStartEPPipeline(
“passes-ep-pipeline-start”,
cl::desc(“Pipeline start extension point));

  1. Using a Lambda function as a callback is the most convenient way to do this. To parse the pipeline description, we must call the parsePassPipeline() method of the PassBuilder instance. The passes are added to the PM pass manager and given as an argument to the Lambda function. If an error occurs, we only print an error message without stopping the application. You can add this snippet after the call to the crossRegisterProxies() method: PB.registerPipelineStartEPCallback(
    [&PB, Argv0](ModulePassManager &PM) {
    if (auto Err = PB.parsePassPipeline(
    PM, PipelineStartEPPipeline)) {
    WithColor::error(errs(), Argv0)
    << “Could not parse pipeline “
    << PipelineStartEPPipeline.ArgSt
    r << “: “
    << toString(std::move(Err)) << “\n”;
    }
    });

Creating an optimization pipeline – Optimizing IR-3

  1. For the code generation process, we have to use the old pass manager. We must simply declare the CodeGenPM instances and add the pass, which makes target-specific information available at the IR transformation level: legacy::PassManager CodeGenPM;
    CodeGenPM.add(createTargetTransformInfoWrapperPass(
    TM->getTargetIRAnalysis()));
  2. To output LLVM IR, we must add a pass that prints the IR into a stream: if (FileType == CGFT_AssemblyFile && EmitLLVM) {
    CodeGenPM.add(createPrintModulePass(Out->os()));
    }
  3. Otherwise, we must let the TargetMachine instance add the required code generation passes, directed by the FileType value we pass as an argument: else {
    if (TM->addPassesToEmitFile(CodeGenPM, Out->os(),
    nullptr, FileType)) {
    WithColor::error()
    << “No support for file type\n”;
    return false;
    }
    }
  4. After all this preparation, we are now ready to execute the passes. First, we must run the optimization pipeline on the IR module. Next, the code generation passes are run. Of course, after all this work, we want to keep the output file: MPM.run(M, MAM); CodeGenPM.run(M);
    Out->keep();
    return true;
    }
  5. That was a lot of code, but the process was straightforward. Of course, we have to update the dependencies in the tools/driver/CMakeLists.txt build file too. Besides adding the target components, we must add all the transformation and code generation components from LLVM. The names roughly resemble the directory names where the source is located. The component name is translated into the link library name during the configuration process:

set(LLVM_LINK_COMPONENTS ${LLVM_TARGETS_TO_BUILD}
AggressiveInstCombine Analysis AsmParser
BitWriter CodeGen Core Coroutines IPO IRReader
InstCombine Instrumentation MC ObjCARCOpts Remarks
ScalarOpts Support Target TransformUtils Vectorize
Passes)

  1. Our compiler driver supports plugins, and we must announce this support:

add_tinylang_tool(tinylang Driver.cpp SUPPORT_PLUGINS)

  1. As before, we have to link against our own libraries:

target_link_libraries(tinylang
PRIVATE tinylangBasic tinylangCodeGen
tinylangLexer tinylangParser tinylangSema)

These are necessary additions to the source code and the build system.

  1. To build the extended compiler, you must change into your build directory and type the following:

$ ninja

Changes to the files of the build system are automatically detected, and cmake is run before compiling and linking our changed source. If you need to re-run the configuration step, please follow the instructions in Chapter 1, Installing LLVM, the Compiling the tinylang application section.
As we have used the options for the opt tool as a blueprint, you should try running tinylang with the options to load a pass plugin and run the pass, as we did in the previous sections.
With the current implementation, we can either run a default pass pipeline or we can construct one ourselves. The latter is very flexible, but in almost all cases, it would be overkill. The default pipeline runs very well for C-like languages. However, what is missing is a way to extend the pass pipeline. We’ll look at how to implement this in the next section.

Creating an optimization pipeline – Optimizing IR-2

  1. Now, we must replace the existing emit() function with a new version. Additionally, we must declare the required PassBuilder instance at the top of the function:

bool emit(StringRef Argv0, llvm::Module *M,
llvm::TargetMachine *TM,
StringRef InputFilename) {
PassBuilder PB(TM);

  1. To implement the support for pass plugins given on the command line, we must loop through the list of plugin libraries given by the user and try to load the plugin. We’ll emit an error message if this fails; otherwise, we’ll register the passes: for (auto &PluginFN : PassPlugins) {
    auto PassPlugin = PassPlugin::Load(PluginFN);
    if (!PassPlugin) {
    WithColor::error(errs(), Argv0)
    << “Failed to load passes from ‘” << PluginFN << “‘. Request ignored.\n”; continue; } PassPlugin->registerPassBuilderCallbacks(PB);
    }
  2. The information from the static plugin registry is used in a similar way to register those plugins with our PassBuilder instance:

define HANDLE_EXTENSION(Ext) \
getExtPluginInfo().RegisterPassBuilderCallbacks( \
PB);
include “llvm/Support/Extension.def”

  1. Now, we need to declare variables for the different analysis managers. The only parameter is the debug flag: LoopAnalysisManager LAM(DebugPM);
    FunctionAnalysisManager FAM(DebugPM);
    CGSCCAnalysisManager CGAM(DebugPM);
    ModuleAnalysisManager MAM(DebugPM);
  2. Next, we must populate the analysis managers with calls to the respective register method on the PassBuilder instance. Through this call, the analysis manager is populated with the default analysis passes and also runs registration callbacks. We must also make sure that the function analysis manager uses the default alias-analysis pipeline and that all analysis managers know about each other: FAM.registerPass(
    [&] { return PB.buildDefaultAAPipeline(); });
    PB.registerModuleAnalyses(MAM);
    PB.registerCGSCCAnalyses(CGAM);
    PB.registerFunctionAnalyses(FAM);
    PB.registerLoopAnalyses(LAM);
    PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
  3. The MPM module pass manager holds the pass pipeline that we constructed. The instance is initialized with the debug flag: ModulePassManager MPM(DebugPM);
  4. Now, we need to implement two different ways to populate the module pass manager with the pass pipeline. If the user provided a pass pipeline on the command line – that is, they have used the –passes option – then we use this as the pass pipeline: if (!PassPipeline.empty()) {
    if (auto Err = PB.parsePassPipeline(
    MPM, PassPipeline)) {
    WithColor::error(errs(), Argv0)
    << toString(std::move(Err)) << “\n”;
    return false;
    }
    }
  5. Otherwise, we use the chosen optimization level to determine the pass pipeline to construct. The name of the default pass pipeline is default, and it takes the optimization level as a parameter: else {
    StringRef DefaultPass;
    switch (OptLevel) {
    case 0: DefaultPass = “default”; break;
    case 1: DefaultPass = “default”; break;
    case 2: DefaultPass = “default”; break;
    case 3: DefaultPass = “default”; break;
    case -1: DefaultPass = “default”; break;
    case -2: DefaultPass = “default”; break;
    }
    if (auto Err = PB.parsePassPipeline(
    MPM, DefaultPass)) {
    WithColor::error(errs(), Argv0)
    << toString(std::move(Err)) << “\n”;
    return false;
    }
    }
  6. With that, the pass pipeline to run transformations on the IR code has been set up. After this step, we need an open file to write the result to. The system assembler and LLVM IR output are text-based, so we should set the OF_Text flag for them: std::error_code EC;
    sys::fs::OpenFlags OpenFlags = sys::fs::OF_None;
    CodeGenFileType FileType = codegen::getFileType();
    if (FileType == CGFT_AssemblyFile)
    OpenFlags |= sys::fs::OF_Text;
    auto Out = std::make_unique(
    outputFilename(InputFilename), EC, OpenFlags);
    if (EC) {
    WithColor::error(errs(), Argv0)
    << EC.message() << ‘\n’;
    return false;
    }

Creating an optimization pipeline – Optimizing IR-1

The tinylang compiler we developed in the previous chapters performs no optimizations on the IR code. In the next few subsections, we’ll add an optimization pipeline to the compiler to achieve this accordingly.
Creating an optimization pipeline
The PassBuilder class is central to setting up the optimization pipeline. This class knows about all registered passes and can construct a pass pipeline from a textual description. We can use this class to either create the pass pipeline from a description given on the command line or use a default pipeline based on the requested optimization level. We also support the use of pass plugins, such as the ppprofiler pass plugin we discussed in the previous section. With this, we can mimic part of the functionality of the opt tool and also use similar names for the command-line options.
The PassBuilder class populates an instance of a ModulePassManager class, which is the pass manager that holds the constructed pass pipeline and runs it. The code generation passes still use the old pass manager. Therefore, we have to retain the old pass manager for this purpose.
For the implementation, we will extend the tools/driver/Driver.cpp file from our tinylang compiler:

  1. We’ll use new classes, so we’ll begin with adding new include files. The llvm/Passes/PassBuilder.h file defines the PassBuilder class. The llvm/Passes/PassPlugin.h file is required for plugin support. Finally, the llvm/Analysis/TargetTransformInfo.h file provides a pass that connects IR-level transformations with target-specific information:

include “llvm/Passes/PassBuilder.h”
include “llvm/Passes/PassPlugin.h”
include “llvm/Analysis/TargetTransformInfo.h”

  1. To use certain features of the new pass manager, we must add three command-line options, using the same names as the opt tool does. The –passes option allows the textual specification of the pass pipeline, while the –load-pass-plugin option allows the use of pass plugins. If the –debug-pass-manager option is given, then the pass manager prints out information about the executed passes:

static cl::opt
DebugPM(“debug-pass-manager”, cl::Hidden,
cl::desc(“Print PM debugging information”));
static cl::opt PassPipeline(
“passes”,
cl::desc(“A description of the pass pipeline”));
static cl::list PassPlugins(
“load-pass-plugin”,
cl::desc(“Load passes from plugin library”));

  1. The user influences the construction of the pass pipeline with the optimization level. The PassBuilder class supports six different optimization levels: no optimization, three levels for optimizing speed, and two levels for reducing size. We can capture all levels in one command-line option:

static cl::opt OptLevel(
cl::desc(“Setting the optimization level:”),
cl::ZeroOrMore,
cl::values(
clEnumValN(3, “O”, “Equivalent to -O3”),
clEnumValN(0, “O0”, “Optimization level 0”),
clEnumValN(1, “O1”, “Optimization level 1”),
clEnumValN(2, “O2”, “Optimization level 2”),
clEnumValN(3, “O3”, “Optimization level 3”),
clEnumValN(-1, “Os”,
“Like -O2 with extra optimizations “
“for size”),
clEnumValN(
-2, “Oz”,
“Like -Os but reduces code size further”)),
cl::init(0));

  1. The plugin mechanism of LLVM supports a plugin registry for statically linked plugins, which is created during the configuration of the project. To make use of this registry, we must include the llvm/Support/Extension.def database file to create the prototype for the functions that return the plugin information:

define HANDLE_EXTENSION(Ext) \
llvm::PassPluginLibraryInfo getExtPluginInfo();
include “llvm/Support/Extension.def”

NOTE – Optimizing IR

The runtime.c file is not instrumented because the pass checks that the special functions are not yet declared in a module.
This already looks better, but does it scale to larger programs? Let’s assume you want to build an instrumented binary of the tinylang compiler for Chapter 5. How would you do this?
You can pass compiler and linker flags on the CMake command line, which is exactly what we need. The flags for the C++ compiler are given in the CMAKE_CXX_FLAGS variable. Thus, specifying the following on the CMake command line adds the new pass to all compiler runs:

-DCMAKE_CXX_FLAGS=”-fpass-plugin=/PPProfiler.so”

Please replace with the absolute path to the shared library.
Similarly, specifying the following adds the runtime.o file to each linker invocation. Again, please replace with the absolute path to a compiled version of runtime.c:

-DCMAKE_EXE_LINKER_FLAGS=”/runtime.o”

Of course, this requires clang as the build compiler. The fastest way to make sure clang is used as the build compiler is to set the CC and CXX environment variables accordingly:

export CC=clang
export CXX=clang++

With these additional options, the CMake configuration from Chapter 5 should run as usual.
After building the tinylang executable, you can run it with the example Gcd.mod file. The ppprofile.csv file will also be written, this time with more than 44,000 lines!
Of course, having such a dataset raises the question of if you can get something useful out of it. For example, getting a list of the 10 most often called functions, together with the call count and the time spent in the function, would be useful information. Luckily, on a Unix system, you have a couple of tools that can help. Let’s build a short pipeline that matches enter events with exit events, counts the functions, and displays the top 10 functions. The awk Unix tool helps with most of these steps.
To match an enter event with an exit event, the enter event must be stored in the record associative map. When an exit event is matched, the stored enter event is looked up, and the new record is written. The emitted line contains the timestamp from the enter event, the timestamp from the exit event, and the difference between both. We must put this into the join.awk file:

BEGIN { FS = “|”; OFS = “|” }
/enter/ { record[$2] = $0 }
/exit/ { split(record[$2],val,”|”)
print val[2], val[3], $3, $3-val[3], val[4] }

To count the function calls and the execution, two associative maps, count and sum, are used. In count, the function calls are counted, while in sum, the execution time is added. In the end, the maps are dumped. You can put this into the avg.awk file:

BEGIN { FS = “|”; count[“”] = 0; sum[“”] = 0 }
{ count[$1]++; sum[$1] += $4 }
END { for (i in count) {
if (i != “”) {
print count[i], sum[i], sum[i]/count[i], I }
} }

After running these two scripts, the result can be sorted in descending order, and then the top 10 lines can be taken from the file. However, we can still improve the function names, __ppp_enter() and __ppp_exit(), which are mangled and are therefore difficult to read. Using the llvm-cxxfilt tool, the names can be demangled. The demangle.awk script is as follows:

{ cmd = “llvm-cxxfilt ” $4
(cmd) | getline name
close(cmd); $4 = name; print }

To get the top 10 function calls, you can run the following:

$ cat ppprofile.csv | awk -f join.awk | awk -f avg.awk |\
sort -nr | head -15 | awk -f demangle.awk

Here are some sample lines from the output:

446 1545581 3465.43 charinfo::isASCII(char)
409 826261 2020.2 llvm::StringRef::StringRef()
382 899471 2354.64
tinylang::Token::is(tinylang::tok::TokenKind) const
171 1561532 9131.77 charinfo::isIdentifierHead(char)

The first number is the call count of the function, the second is the cumulated execution time, and the third number is the average execution time. As explained previously, do not trust the time values, though the call counts should be accurate.
So far, we’ve implemented a new instrumentation pass, either as a plugin or as an addition to LLVM, and we used it in some real-world scenarios. In the next section, we’ll explore how to set up an optimization pipeline in our compiler.

SPECIFYING A PASS PIPELINE – Optimizing IR

With the –-passes option, you can not only name a single pass but you can also describe a whole pipeline. For example, the default pipeline for optimization level 2 is named default<O2>. You can run the ppprofile pass before the default pipeline with the–-passes=”ppprofile,default<O2>” argument. Please note that the pass names in such a pipeline description must be of the same type.

Now, let’s turn to using the new pass with clang.

Plugging the new pass into clang

In the previous section, you learned how you can run a single pass using opt. This is useful if you need to debug a pass but for a real compiler, the steps should not be that involved.

To achieve the best result, a compiler needs to run the optimization passes in a certain order. The LLVM pass manager has a default order for pass execution. This is also called the default pass pipeline. Using opt, you can specify a different pass pipeline with the –passes option. This is flexible but also complicated for the user. It also turns out that most of the time, you just want to add a new pass at very specific points, such as before optimization passes are run or at the end of the loop optimization processes. These points are called extension points. The PassBuilder class allows you to register a pass at an extension point. For example, you can call the registerPipelineStartEPCallback() method to add a pass to the beginning of the optimization pipeline. This is exactly the place we need for the ppprofiler pass. During optimization, functions may be inlined, and the pass will miss those inline functions. Instead, running the pass before the optimization passes guarantees that all functions are instrumented.

To use this approach, you need to extend the RegisterCB() function in the pass plugin. Add the following code to the function:
  PB.registerPipelineStartEPCallback(
      [](ModulePassManager &PM, OptimizationLevel Level) {
        PM.addPass(PPProfilerIRPass());
      });

Whenever the pass manager populates the default pass pipeline, it calls all the callbacks for the extension points. We simply add the new pass here.

To load the plugin into clang, you can use the -fpass-plugin option. Creating the instrumented executable of the hello.c file now becomes almost trivial:
$ clang -fpass-plugin=./PPProfiler.so hello.c runtime.c

Please run the executable and verify that the run creates the ppprofiler.csv file.

Using the ppprofiler pass with LLVM tools – Optimizing IR-2

Often, the runtime support for a feature is more complicated than adding that feature to the compiler itself. This is also true in this case. When the __ppp_enter() and __ppp_exit() functions are called, you can view this as an event. To analyze the data later, it is necessary to save the events. The basic data you would like to get is the event of the type, the name of the function and its address, and a timestamp. Without tricks, this is not as easy as it seems. Let’s give it a try.
Create a file called runtime.c with the following content:

  1. You need the file I/O, standard functions, and time support. This is provided by the following includes:

include
include
include

  1. For the file, a file descriptor is needed. Moreover, when the program finishes, that file descriptor should be closed properly:

static FILE *FileFD = NULL;
static void cleanup() {
if (FileFD == NULL) {
fclose(FileFD);
FileFD = NULL;
}
}

  1. To simplify the runtime, only a fixed name for the output is used. If the file is not open, then open the file and register the cleanup function:

static void init() {
if (FileFD == NULL) {
FileFD = fopen(“ppprofile.csv”, “w”);
atexit(&cleanup);
}
}

  1. You can call the clock_gettime() function to get a timestamp. The CLOCK_PROCESS_CPUTIME_ID parameter returns the time consumed by this process. Please note that not all systems support this parameter. You can use one of the other clocks, such as CLOCK_REALTIME, if necessary:

typedef unsigned long long Time;
static Time get_time() {
struct timespec ts;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
return 1000000000L * ts.tv_sec + ts.tv_nsec;
}

  1. Now, it is easy to define the __ppp_enter() function. Just make sure the file is open, get the timestamp, and write the event:

void __ppp_enter(const char *FnName) {
init();
Time T = get_time();
void *Frame = __builtin_frame_address(1);
fprintf(FileFD,
// “enter|name|clock|frame”
„enter|%s|%llu|%p\n”, FnName, T, Frame);
}

  1. The __ppp_exit() function only differs in terms of the event type:

void __ppp_exit(const char *FnName) {
init();
Time T = get_time();
void *Frame = __builtin_frame_address(1);
fprintf(FileFD,
// “exit|name|clock|frame”
„exit|%s|%llu|%p\n”, FnName, T, Frame);
}

That concludes a very simple implementation for runtime support. Before we try it, some remarks should be made about the implementation as it should be obvious that there are several problematic parts.
First of all, the implementation is not thread-safe since there is only one file descriptor, and access to it is not protected. Trying to use this runtime implementation with a multithreaded program will most likely lead to disturbed data in the output file.
In addition, we omitted checking the return value of the I/O-related functions, which can result in data loss.
But most importantly, the timestamp of the event is not precise. Calling a function already adds overhead, but performing I/O operations in that function makes it even worse. In principle, you can match the enter and exit events for a function and calculate the runtime of the function. However, this value is inherently flawed because it may include the time required for I/O. In summary, do not trust the times recorded here.
Despite all the flaws, this small runtime file allows us to produce some output. Compile the bitcode of the instrumented file together with the file containing the runtime code and run the resulting executable:

$ clang hello_inst.bc runtime.c
$ ./a.out

This results in a new file called ppprofile.csv in the directory that contains the following content:

$ cat ppprofile.csv
enter|main|3300868|0x1
exit|main|3760638|0x1

Cool – the new pass and the runtime seem to work!