TIP FOR DEBUGGING – Taking LLVM to the Next Level

17Apr

Exploring the lli tool – JIT Compilation

by Nancy Rohan Extending the pass pipeline, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

Using JIT compilation for direct execution
Running LLVM IR directly is the first idea that comes to mind when thinking about a JIT compiler. This is what the lli tool, the LLVM interpreter, and the dynamic compiler do. We will explore the lli tool in the next section.
Exploring the lli tool
Let’s try the lli tool with a very simple example. The following LLVM IR can be stored as a file called hello.ll, which is the equivalent of a C hello world application. This file declares a prototype for the printf() function from the C library. The hellostr constant contains the message to be printed. Inside the main() function, a call to the printf() function is generated, and this function contains a hellostr message that will be printed. The application always returns 0.
The complete source code is as follows:

declare i32 @printf(ptr, …)
@hellostr = private unnamed_addr constant [13 x i8] c”Hello world\0A\00″
define dso_local i32 @main(i32 %argc, ptr %argv) {
%res = call i32 (ptr, …) @printf(ptr @hellostr)
ret i32 0
}

This LLVM IR file is generic enough that it is valid for all platforms. We can directly execute the IR using the lli tool with the following command:

$ lli hello.ll
Hello world

The interesting point here is how the printf() function is found. The IR code is compiled to machine code, and a lookup for the printf symbol is triggered. This symbol is not found in the IR, so the current process is searched for it. The lli tool dynamically links against the C library, and the symbol is found there.
Of course, the lli tool does not link against the libraries you created. To enable the use of such functions, the lli tool supports the loading of shared libraries and objects. The following C source just prints a friendly message:

include
void greetings() {
puts(“Hi!”);
}

Stored in greetings.c, we use this to explore loading objects with lli. The following command will compile this source into a shared library. The –fPIC option instructs clang to generate position-independent code, which is required for shared libraries. Moreover, the compiler creates a greetings.so shared library with –shared:

$ clang greetings.c -fPIC -shared -o greetings.so

We also compile the file into the greetings.o object file:

$ clang greetings.c -c -o greetings.o

We now have two files, the greetings.so shared library and the greetings.o object file, which we will load into the lli tool.
We also need an LLVM IR file that calls the greetings() function. For this, create a main.ll file that contains a single call to the function:

declare void @greetings(…)
define dso_local i32 @main(i32 %argc, i8** %argv) {
call void (…) @greetings()
ret i32 0
}

Notice that on executing, the previous IR crashes, as lli cannot locate the greetings symbol:

$ lli main.ll
JIT session error: Symbols not found: [ _greetings ]
lli: Failed to materialize symbols: { (main, { _main }) }

The greetings() function is defined in an external file, and to fix the crash, we have to tell the lli tool which additional file needs to be loaded. In order to use the shared library, you must use the –load option, which takes the path to the shared library as an argument:

$ lli –load ./greetings.so main.ll
Hi!

It is important to specify the path to the shared library if the directory containing the shared library is not in the search path for the dynamic loader. If omitted, then the library will not be found.
Alternatively, we can instruct lli to load the object file with –extra-object:

$ lli –extra-object greetings.o main.ll
Hi!

Other supported options are –extra-archive, which loads an archive, and –extra-module, which loads another bitcode file. Both options require the path to the file as an argument.
You now know how you can use the lli tool to directly execute LLVM IR. In the next section, we will implement our own JIT tool.

25Mar

LLVM’s overall JIT implementation and use cases – JIT Compilation

by Nancy Rohan Creating a new TableGen tool, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

So far, we have only looked at ahead-of-time (AOT) compilers. These compilers compile the whole application. The application can only run after the compilation is finished. If the compilation is performed at the runtime of the application, then the compiler is a JIT compiler. A JIT compiler has interesting use cases:

Implementation of a virtual machine: A programming language can be translated to byte code with an AOT compiler. At runtime, a JIT compiler is used to compile the byte code to machine code. The advantage of this approach is that the byte code is hardware-independent, and thanks to the JIT compiler, there is no performance penalty compared to an AOT compiler. Java and C use this model today, but this is not a new idea: the USCD Pascal compiler from 1977 already used a similar approach.
Expression evaluation: A spreadsheet application can compile often-executed expressions with a JIT compiler. For example, this can speed up financial simulations. The lldb LLVM debugger uses this approach to evaluate source expressions at debug time.
Database queries: A database creates an execution plan from a database query. The execution plan describes operations on tables and columns, which leads to a query answer when executed. A JIT compiler can be used to translate the execution plan into machine code, which speeds up the execution of the query.

The static compilation model of LLVM is not as far away from the JIT model as one may think. The llc LLVM static compiler compiles LLVM IR into machine code and saves the result as an object file on disk. If the object file is not stored on disk but in memory, would the code be executable? Not directly, as references to global functions and global data use relocations instead of absolute addresses. Conceptually, a relocation describes how to calculate the address – for example, as an offset to a known address. If we resolve relocations into addresses, as the linker and the dynamic loader do, then we can execute the object code. Running the static compiler to compile IR code into an object file in memory, performing a link step on the in-memory object file, and running the code gives us a JIT compiler. The JIT implementation in the LLVM core libraries is based on this idea.

During the development history of LLVM, there were several JIT implementations, with different feature sets. The latest JIT API is the On-Request Compilation (ORC) engine. In case you were curious about the acronym, it was the lead developer’s intention to invent yet another acronym based on Tolkien’s universe, after Executable and Linking Format (ELF) and Debugging Standard (DWARF) were already present.

The ORC engine builds on and extends the idea of using the static compiler and a dynamic linker on the in-memory object file. The implementation uses a layered approach. The two basic levels are the compile layer and the link layer. On top of this sits a layer providing support for lazy compilation. A transformation layer can be stacked on top or below the lazy compilation layer, allowing the developer to add arbitrary transformations or simply to be notified of certain events. Moreover, this layered approach has the advantage that the JIT engine is customizable for diverse requirements. For example, a high-performance virtual machine may choose to compile everything upfront and make no use of the lazy compilation layer. On the other hand, other virtual machines will emphasize startup time and responsiveness to the user and will achieve this with the help of the lazy compilation layer.

The older MCJIT engine is still available, and its API is derived from an even older, already-removed JIT engine. Over time, this API gradually became bloated, and it lacks the flexibility of the ORC API. The goal is to remove this implementation, as the ORC engine now provides all the functionality of the MCJIT engine, and new developments should use the ORC API.

In the next section, we look at lli, the LLVM interpreter, and the dynamic compiler, before we dive into implementing a JIT compiler.

23Feb

Drawbacks of TableGen – The TableGen Language

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

Performance of the token filter

Using a plain binary search for the keyword filter does not give a better performance than the implementation based on the llvm::StringMap type. To beat the performance of the current implementation, you need to generate a perfect hash function.

The classic algorithm from Czech, Havas, and Majewski can be easily implemented, and it gives you a very good performance. It is described in An optimal algorithm for generating minimal perfect hash functions, Information Processing Letters, Volume 43, Issue 5, 1992. See https://www.sciencedirect.com/science/article/abs/pii/002001909290220P.

A state-of-the-art algorithm is PTHash from Pibiri and Trani, described in PTHash: Revisiting FCH Minimal Perfect Hashing, SIGIR ’21. See https://arxiv.org/pdf/2104.10402.pdf.

Both algorithms are good candidates for generating a token filter that is actually faster than llvm::StringMap.

Drawbacks of TableGen

Here are a few drawbacks of TableGen:

The TableGen language is built on a simple concept. As a consequence, it does not have the same computing capabilities as other DSLs. Obviously, some programmers would like to replace TableGen with a different, more powerful language, and this topic comes up from time to time in the LLVM discussion forum.
With the possibility of implementing your own backends, the TableGen language is very flexible. However, it also means that the semantics of a given definition are hidden inside the backend. Thus, you can create TableGen files that are basically not understandable by other developers.
And last, the backend implementation can be very complex if you try to solve a non-trivial task. It is reasonable to expect that this effort would be lower if the TableGen language were more powerful.

Even if not all developers are happy with the capabilities of TableGen, the tool is used widely in LLVM, and for a developer, it is important to understand it.

Summary

In this chapter, you first learned the main idea behind TableGen. Then, you defined your first classes and records in the TableGen language, and you acquired knowledge of the syntax of TableGen. Finally, you developed a TableGen backend emitting fragments of C++ source code, based on the TableGen classes you defined.

In the next chapter, we examine another unique feature of LLVM: generating and executing code in one step, also known as Just-In-Time (JIT) compilation.

16Jan

Creating a new TableGen tool – The TableGen Language-5

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, Exams of IT, ITCertification Exams, TIP FOR DEBUGGING

The only missing part now is a way to call this implementation, for which you define a global function, EmitTokensAndKeywordFilter(). The emitSourceFileHeader() function declared in the llvm/TableGen/TableGenBackend.h header emits a comment at the top of the generated file:

void EmitTokensAndKeywordFilter(RecordKeeper &RK,
raw_ostream &OS) {
emitSourceFileHeader(“Token Kind and Keyword Filter “
“Implementation Fragment”,
OS);
TokenAndKeywordFilterEmitter(RK).run(OS);
}

With that, you finished the implementation of the source emitter in the TokenEmitter.cpp file. Overall, the coding is not too complicated.
The TableGenBackends.h header file only contains the declaration of the EmitTokensAndKeywordFilter() function. To avoid including other files, you use forward declarations for the raw_ostream and RecordKeeper classes:

ifndef TABLEGENBACKENDS_H
define TABLEGENBACKENDS_H
namespace llvm {
class raw_ostream;
class RecordKeeper;
} // namespace llvm
void EmitTokensAndKeywordFilter(llvm::RecordKeeper &RK,
llvm::raw_ostream &OS);
endif

The missing part is the implementation of the driver. Its task is to parse the TableGen file and emit the records according to the command-line options. The implementation is in the TableGen.cpp file:

As usual, the implementation begins with including the required headers. The most important one is llvm/TableGen/Main.h because this header declares the frontend of TableGen:

include “TableGenBackends.h”
include “llvm/Support/CommandLine.h”
include “llvm/Support/PrettyStackTrace.h”
include “llvm/Support/Signals.h”
include “llvm/TableGen/Main.h”
include “llvm/TableGen/Record.h”

To simplify coding, the llvm namespace is imported:

using namespace llvm;

The user can choose one action. The ActionType enumeration contains all possible actions:

enum ActionType {
PrintRecords,
DumpJSON,
GenTokens,
};

A single command-line option object called Action is used. The user needs to specify the –gen-tokens option to emit the token filter you implemented. The other two options, –print-records and –dump-json, are standard options to dump read records. Note that the object is in an anonymous namespace:

namespace {
cl::opt Action(
cl::desc(“Action to perform:”),
cl::values(
clEnumValN(
PrintRecords, “print-records”,
“Print all records to stdout (default)”),
clEnumValN(DumpJSON, “dump-json”,
“Dump all records as “
“machine-readable JSON”),
clEnumValN(GenTokens, “gen-tokens”,
“Generate token kinds and keyword “
“filter”)));

The Main() function performs the requested action based on the value of Action. Most importantly, your EmitTokensAndKeywordFilter() function is called if –gen-tokens was specified on the command line. After the end of the function, the anonymous namespace is closed:

bool Main(raw_ostream &OS, RecordKeeper &Records) {
switch (Action) {
case PrintRecords:
OS << Records; // No argument, dump all contents
break;
case DumpJSON:
EmitJSON(Records, OS);
break;
case GenTokens:
EmitTokensAndKeywordFilter(Records, OS);
break;
}
return false;
}
} // namespace

And lastly, you define a main() function. After setting up the stack trace handler and parsing the command-line options, the TableGenMain() function is called to parse the TableGen file and create records. That function also calls your Main() function if there are no errors:

int main(int argc, char **argv) {
sys::PrintStackTraceOnErrorSignal(argv[0]);
PrettyStackTraceProgram X(argc, argv);
cl::ParseCommandLineOptions(argc, argv);
llvm_shutdown_obj Y;
return TableGenMain(argv[0], &Main);
}

Your own TableGen tool is now implemented. After compiling, you can run it with the KeywordC.td sample input file as follows:

$ tinylang-tblgen –gen-tokens –o TokenFilter.inc KeywordC.td

The generated C++ source code is written to the TokenFilter.inc file.

8Dec

Creating a new TableGen tool – The TableGen Language-4

by Nancy Rohan Drawbacks of TableGen, Exploring the lli tool, ITCertification Exams, TIP FOR DEBUGGING

The keywords used for the filter are in the list named Tokens. To get access to that list, you first need to look up the Tokens field in the record. This returns a pointer to an instance of the RecordVal class, from which you can retrieve the Initializer instance via the calling method, getValue(). The Tokens field is defined as a list, so you cast the initializer instance to ListInit. If this fails, then exit the function: ListInit *TokenFilter = dyn_cast_or_null(
AllTokenFilter[0]
->getValue(“Tokens”)
->getValue());
if (!TokenFilter)
return;
Now, you are ready to construct a filter table. For each keyword stored in the TokenFilter, list, you need the name and the value of the Flag field. That field is again defined as a list, so you need to loop over those elements to calculate the final value. The resulting name/flag value pair is stored in a Table vector: using KeyFlag = std::pair;
std::vector Table;
for (size_t I = 0, E = TokenFilter->size(); I < E; ++I) { Record *CC = TokenFilter->getElementAsRecord(I);
StringRef Name = CC->getValueAsString(“Name”);
uint64_t Val = 0;
ListInit *Flags = nullptr;
if (RecordVal *F = CC->getValue(“Flags”))
Flags = dyn_cast_or_null(F->getValue());
if (Flags) {
for (size_t I = 0, E = Flags->size(); I < E; ++I) { Val |= Flags->getElementAsRecord(I)->getValueAsInt(
“Val”);
}
}
Table.emplace_back(Name, Val);
}
To be able to perform a binary search, the table needs to be sorted. The comparison function is provided by a lambda function: llvm::sort(Table.begin(), Table.end(),
[](const KeyFlag A, const KeyFlag B) {
return A.first < B.first;
});
Now, you can emit the C++ source code. First, you emit the sorted table containing the name of the keyword and the associated flag value: OS << “ifdef GET_KEYWORD_FILTER\n”
<< “undef GET_KEYWORD_FILTER\n”;
OS << “bool lookupKeyword(llvm::StringRef Keyword, “
“unsigned &Value) {\n”;
OS << ” struct Entry {\n”
<< ” unsigned Value;\n”
<< ” llvm::StringRef Keyword;\n”
<< ” };\n”
<< “static const Entry Table” << Table.size() << “ = {\n”;
for (const auto &[Keyword, Value] : Table) {
OS << ” { ” << Value << “, llvm::StringRef(\””
<< Keyword << “\”, ” << Keyword.size()
<< “) },\n”;
}
OS << ” };\n\n”;
Next, you look up the keyword in the sorted table, using the std::lower_bound() standard C++ function. If the keyword is in the table, then the Value parameter receives the value of the flags associated with the keyword, and the function returns true. Otherwise, the function simply returns false: OS << ” const Entry *E = ” “std::lower_bound(&Table[0], ” “&Table” << Table.size() << “, Keyword, [](const Entry &A, const ” “StringRef ” “&B) {\n”; OS << ” return A.Keyword < B;\n”; OS << ” });\n”; OS << ” if (E != &Table” << Table.size() << “) {\n”; OS << ” Value = E->Value;\n”;
OS << ” return true;\n”;
OS << ” }\n”;
OS << ” return false;\n”;
OS << “}\n”;
OS << “endif\n”;
}

18Oct

Creating a new TableGen tool – The TableGen Language-2

by Nancy Rohan Exploring the lli tool, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

The run() method calls all the emitting methods. It also times the length of each phase. You specify the –time-phases option, and then the timing is shown after all code is generated:

void TokenAndKeywordFilterEmitter::run(raw_ostream &OS) {
// Emit Flag fragments.
Records.startTimer(“Emit flags”);
emitFlagsFragment(OS);
// Emit token kind enum and functions.
Records.startTimer(“Emit token kind”);
emitTokenKind(OS);
// Emit keyword filter code.
Records.startTimer(“Emit keyword filter”);
emitKeywordFilter(OS);
Records.stopTimer();
}

The emitFlagsFragment() method shows the typical structure of a function emitting C++ source code. The generated code is guarded by the GET_TOKEN_FLAGS macro. To emit the C++ source fragment, you loop over all records that are derived from the Flag class in the TableGen file. Having such a record, it is easy to query the record for the name and the value. Please note that the names Flag, Name, and Val must be written exactly as in the TableGen file. If you rename Val to Value in the TableGen file, then you also need to change the string in this function. All the generated source code is written to the provided stream, OS:

void TokenAndKeywordFilterEmitter::emitFlagsFragment(
raw_ostream &OS) {
OS << “ifdef GET_TOKEN_FLAGS\n”; OS << “undef GET_TOKEN_FLAGS\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Flag”)) { StringRef Name = CC->getValueAsString(“Name”);
int64_t Val = CC->getValueAsInt(“Val”);
OS << Name << ” = ” << format_hex(Val, 2) << “,\n”;
}
OS << “endif\n”;
}

The emitTokenKind() method emits a declaration and definition of token classification functions. Let’s have a look at emitting the declarations first. The overall structure is the same as the previous method – only more C++ source code is emitted. The generated source fragment is guarded by the GET_TOKEN_KIND_DECLARATION macro. Please note that this method tries to generate nicely formatted C++ code, using new lines and indentation as a human developer would do. In case the emitted source code is not correct, and you need to examine it to find the error, this will be tremendously helpful. It is also easy to make such errors: after all, you are writing a C++ function that emits C++ source code.
First, the TokenKind enumeration is emitted. The name for a keyword should be prefixed with a kw_ string. The loop goes over all records of the Token class, and you can query the records if they are also a subclass of the Keyword class, which enables you to emit the prefix: OS << “ifdef GET_TOKEN_KIND_DECLARATION\n” << “undef GET_TOKEN_KIND_DECLARATION\n” << “namespace tok {\n” << ” enum TokenKind : unsigned short {\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Token”)) { StringRef Name = CC->getValueAsString(“Name”);
OS << ” “; if (CC->isSubClassOf(“Keyword”))
OS << “kw_”;
OS << Name << “,\n”;
}
OS << „ NUM_TOKENS\n”
<< „ };\n”;

27Jul

Generating C++ code from a TableGen file – The TableGen Language

by Nancy Rohan Exams of IT, Exploring the lli tool, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

In the previous section, you defined records in the TableGen language. To make use of those records, you need to write your own TableGen backend that can produce C++ source code or do other things using the records as input.

In Chapter 3, Turning the Source File into an Abstract Syntax Tree, the implementation of the Lexer class uses a database file to define tokens and keywords. Various query functions make use of that database file. Besides that, the database file is used to implement a keyword filter. The keyword filter is a hash map, implemented using the llvm::StringMap class. Whenever an identifier is found, the keyword filter is called to find out if the identifier is actually a keyword. If you take a closer look at the implementation using the ppprofiler pass from Chapter 6, Advanced IR Generation, then you will see that this function is called quite often. Therefore, it may be useful to experiment with different implementations to make that functionality as fast as possible.

However, this is not as easy as it seems. For example, you can try to replace the lookup in the hash map with a binary search. This requires that the keywords in the database file are sorted. Currently, this seems to be the case, but during development, a new keyword might be added in the wrong place undetected. The only way to make sure that the keywords are in the right order is to add some code that checks the order at runtime.

You can speed up the standard binary search by changing the memory layout. For example, instead of sorting the keywords, you can use the Eytzinger layout, which enumerates the search tree in breadth-first order. This layout increases the cache locality of the data and therefore speeds up the search. Personally speaking, maintaining the keywords in breadth-first order manually in the database file is not possible.

Another popular approach for searching is the generation of minimal perfect hash functions. If you insert a new key into a dynamic hash table such as llvm::StringMap, then that key might be mapped to an already occupied slot. This is called a key collision. Key collisions are unavoidable, and many strategies have been developed to mitigate that problem. However, if you know all the keys, then you can construct hash functions without key collisions. Such hash functions are called perfect. In case they do not require more slots than keys, then they are called minimal. Perfect hash functions can be generated efficiently – for example, with the gperf GNU tool.

In summary, there is some incentive to be able to generate a lookup function from keywords. So, let’s move the database file to TableGen!

24May

Simulating function calls – The TableGen Language

by Nancy Rohan Extending the pass pipeline, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

In some cases, using a multiclass like in the previous example can lead to repetitions. Assume that the CPU also supports memory operands, in a way similar to immediate operands. You can support this by adding a new record definition to the multiclass:

multiclass InstWithOps {
def “”: Inst;
def “I”: Inst;
def “M”: Inst;
}

This is perfectly fine. But now, imagine you do not have 3 but 16 records to define, and you need to do this multiple times. A typical scenario where such a situation can arise is when the CPU supports many vector types, and the vector instructions vary slightly based on the used type.
Please note that all three lines with the def statement have the same structure. The variation is only in the suffix of the name and of the mnemonic, and the delta value is added to the opcode. In C, you could put the data into an array and implement a function that returns the data based on an index value. Then, you could create a loop over the data instead of manually repeating statements.
Amazingly, you can do something similar in the TableGen language! Here is how to transform the example:

To store the data, you define a class with all required fields. The class is called InstDesc, because it describes some properties of an instruction:

class InstDesc {
string Name = name;
string Suffix = suffix;
int Delta = delta;
}

Now, you can define records for each operand type. Note that it exactly captures the differences observed in the data:

def RegOp : InstDesc<“”, “”, 0>;
def ImmOp : InstDesc<“I”, “””, 1>;
def MemOp : InstDesc””””,””””, 2>;

Imagine you have a loop enumerating the numbers 0, 1, and 2, and you want to select one of the previously defined records based on the index. How can you do this? The solution is to create a getDesc class that takes the index as a parameter. It has a single field, ret, that you can interpret as a return value. To assign the correct value to this field, the !cond operator is used:

class getDesc {
InstDesc ret = !cond(!eq(n, 0) : RegOp,
!eq(n, 1) : ImmOp,
!eq(n, 2) : MemOp);
}

This operator works similarly to a switch/case statement in C.

Now, you are ready to define the multiclass. The TableGen language has a loop statement, and it also allows us to define variables. But remember that there is no dynamic execution! As a consequence, the loop range is statically defined, and you can assign a value to a variable, but you cannot change that value later. However, this is enough to retrieve the data. Please note how the use of the getDesc class resembles a function call. But there is no function call! Instead, an anonymous record is created, and the values are taken from that record. Lastly, the past operator () performs a string concatenation, similar to the !strconcat operator used earlier:

multiclass InstWithOps {
foreach I = 0-2 in {
defvar Name = getDesc.ret.Name;
defvar Suffix = getDesc.ret.Suffix;
defvar Delta = getDesc.ret.Delta;
def Name: Inst;
}
} Now, you use the multiclass as before to define records: defm ADD : InstWithOps<“add”, 0xA0>; Please run llvm-tblgen and examine the records. Besides the various ADD records, you will also see a couple of anonymous records generated by the use of the getDesc class.
This technique is used in the instruction definition of several LLVM backends. With the knowledge you have acquired, you should have no problem understanding those files.
The foreach statement used the syntax 0-2 to denote the bounds of the range. This is called a range piece. An alternative syntax is to use three dots (0…3), which is useful if the numbers are negative. Lastly, you are not restricted to numerical ranges; you can also loop over a list of elements, which allows you to use strings or previously defined records. For example, you may like the use of the foreach statement, but you think that using the getDesc class is too complicated. In this case, looping over the InstDesc records is the solution: multiclass InstWithOps {
foreach I = [RegOp, ImmOp, MemOp] in {
defvar Name = I.Name;
defvar Suffix = I.Suffix;
defvar Delta = I.Delta;
def Name: Inst;
}
} So far, you only defined records in the TableGen language, using the most commonly used statements. In the next section, you’ll learn how to generate C++ source code from records defined in the TableGen language.

8Oct

TIP – Optimizing IR

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, Exams of IT, ITCertification Exams, TIP FOR DEBUGGING

To allow the user to add passes at every extension point, you need to add the preceding code snippet for each extension point.

Now is a good time to try out the different pass manager options. With the –debug-pass-manager option, you can follow which passes are executed in which order. You can also print the IR before or after each pass, which is invoked with the –print-before-all and –print-after-all options. If you created your own pass pipeline, then you can insert the print pass in points of interest. For example, try the –passes=”print,inline,print” option. Furthermore, to identify which pass changes the IR code, you can use the –print-changed option, which will only print the IR code if it has changed compared to the result from the pass before. The greatly reduced output makes it much easier to follow IR transformations.

The PassBuilder class has a nested OptimizationLevel class to represent the six different optimization levels. Instead of using the “default<O?>” pipeline description as an argument to the parsePassPipeline() method, we can also call the buildPerModuleDefaultPipeline() method, which builds the default optimization pipeline for the request level – except for level O0. This optimization level means that no optimization is performed.

Consequently, no passes are added to the pass manager. If we still want to run a certain pass, then we can add it to the pass manager manually. A simple pass to run at this level is the AlwaysInliner pass, which inlines a function marked with the always_inline attribute into the caller. After translating the command-line option value for the optimization level into the corresponding member of the OptimizationLevel class, we can implement this as follows:
    PassBuilder::OptimizationLevel Olevel = …;
    if (OLevel == PassBuilder::OptimizationLevel::O0)
      MPM.addPass(AlwaysInlinerPass());
    else
      MPM = PB.buildPerModuleDefaultPipeline(OLevel, DebugPM);

Of course, it is possible to add more than one pass to the pass manager in this fashion. PassBuilder also uses the addPass() method when constructing the pass pipeline.

Running extension point callbacks

Because the pass pipeline is not populated for optimization level O0, the registered extension points are not called. If you use the extension points to register passes that should also run at O0 level, this is problematic. You can call the runRegisteredEPCallbacks() method to run the registered extension point callbacks, resulting in a pass manager populated only with the passes that were registered through the extension points.

By adding the optimization pipeline to tinylang, you created an optimizing compiler similar to clang. The LLVM community works on improving the optimizations and the optimization pipeline with each release. Due to this, it is very seldom that the default pipeline is not used. Most often, new passes are added to implement certain semantics of the programming language.

Summary

In this chapter, you learned how to create a new pass for LLVM. You ran the pass using a pass pipeline description and an extension point. You extended your compiler with the construction and execution of a pass pipeline similar to clang, turning tinylang into an optimizing compiler. The pass pipeline allows the addition of passes at extension points, and you learned how you can register passes at these points. This allows you to extend the optimization pipeline with your developed passes or existing passes.

In the next chapter, you will learn the basics of the TableGen language, which is used extensively in LLVM and clang to significantly reduce manual programming.

27Aug

Extending the pass pipeline – Optimizing IR

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

In the previous section, we used the PassBuilder class to create a pass pipeline, either from a user-provided description or a predefined name. Now, let’s look at another way to customize the pass pipeline: using extension points.
During the construction of the pass pipeline, the pass builder allows passes contributed by the user to be added. These places are called extension points. A couple of extension points exist, as follows:
• The pipeline start extension point, which allows us to add passes at the beginning of the pipeline
• The peephole extension point, which allows us to add passes after each instance of the instruction combiner pass
Other extension points exist too. To employ an extension point, you must register a callback. During the construction of the pass pipeline, your callback is run at the defined extension point and can add passes to the given pass manager.
To register a callback for the pipeline start extension point, you must call the registerPipelineStartEPCallback() method of the PassBuilder class. For example, to add our PPProfiler pass to the beginning of the pipeline, you would adapt the pass to be used as a module pass with a call to the createModuleToFunctionPassAdaptor() template function and then add the pass to the module pass manager:

PB.registerPipelineStartEPCallback(
[](ModulePassManager &MPM) {
MPM.addPass(PPProfilerIRPass());
});

You can add this snippet in the pass pipeline setup code anywhere before the pipeline is created – that is, before the parsePassPipeline() method is called.
A very natural extension to what we did in the previous section is to let the user pass a pipeline description for an extension point on the command line. The opt tool allows this too. Let’s do this for the pipeline start extension point. Add the following code to the tools/driver/Driver.cpp file:

First, we must a new command line for the user to specify the pipeline description. Again, we take the option name from the opt tool:

static cl::opt PipelineStartEPPipeline(
“passes-ep-pipeline-start”,
cl::desc(“Pipeline start extension point));

Using a Lambda function as a callback is the most convenient way to do this. To parse the pipeline description, we must call the parsePassPipeline() method of the PassBuilder instance. The passes are added to the PM pass manager and given as an argument to the Lambda function. If an error occurs, we only print an error message without stopping the application. You can add this snippet after the call to the crossRegisterProxies() method: PB.registerPipelineStartEPCallback(
[&PB, Argv0](ModulePassManager &PM) {
if (auto Err = PB.parsePassPipeline(
PM, PipelineStartEPPipeline)) {
WithColor::error(errs(), Argv0)
<< “Could not parse pipeline “
<< PipelineStartEPPipeline.ArgSt
r << “: “
<< toString(std::move(Err)) << “\n”;
}
});

Category TIP FOR DEBUGGING