Simulating function calls – Taking LLVM to the Next Level

30May

Implementing our own JIT compiler with LLJIT – JIT Compilation

by Nancy Rohan Exploring the lli tool, Extending the pass pipeline, ITCertification Exams, Simulating function calls

The lli tool is nothing more than a thin wrapper around LLVM APIs. In the first section, we learned that the ORC engine uses a layered approach. The ExecutionSession class represents a running JIT program. Besides other items, this class holds information such as used JITDylib instances. A JITDylib instance is a symbol table that maps symbol names to addresses. For example, these can be symbols defined in an LLVM IR file or the symbols of a loaded shared library.

For executing LLVM IR, we do not need to create a JIT stack on our own, as the LLJIT class provides this functionality. You can also make use of this class when migrating from the older MCJIT implementation, as this class essentially provides the same functionality.

To illustrate the functions of the LLJIT utility, we will be creating an interactive calculator application while incorporating JIT functionality. The main source code of our JIT calculator will be extended from the calc example from Chapter 2, The Structure of a Compiler.

The primary idea behind our interactive JIT calculator will be as follows:

Allow the user to input a function definition, such as def f(x) = x*2.
The function inputted by the user is then compiled by the LLJIT utility into a function – in this case, f.
Allow the user to call the function they have defined with a numerical value: f(3).
Evaluate the function with the provided argument, and print the result to the console: 6.

Before we discuss incorporating JIT functionality into the calculator source code, there are a few main differences to point out with respect to the original calculator example:

Firstly, we previously only input and parsed functions beginning with the with keyword, rather than the def keyword described previously. For this chapter, we instead only accept function definitions beginning with def, and this is represented as a particular node in our abstract syntax tree (AST) class, known as DefDecl. The DefDecl class is aware of the arguments and their names it is defined with, and the function name is also stored within this class.
Secondly, we also need our AST to be aware of function calls, to represent the functions that the LLJIT utility has consumed or JIT’ted. Whenever a user inputs the name of a function, followed by arguments enclosed in parentheses, the AST recognizes these as FuncCallFromDef nodes. This class essentially is aware of the same information as the DefDecl class.

Due to the addition of these two AST classes, it is obvious to expect that the semantic analysis, parser, and code generation classes will be adapted accordingly to handle the changes in our AST. One additional thing to note is the addition of a new data structure, called JITtedFunctions, which all these classes are aware of. This data structure is a map with the defined function names as keys, and the number of arguments a function is defined with is stored as values within the map. We will see later how this data structure will be utilized in our JIT calculator.

For more details on the changes we have made to the calc example, the full source containing the changes from calc and this section’s JIT implementation can be found within the lljit source directory.

17Apr

Exploring the lli tool – JIT Compilation

by Nancy Rohan Extending the pass pipeline, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

Using JIT compilation for direct execution
Running LLVM IR directly is the first idea that comes to mind when thinking about a JIT compiler. This is what the lli tool, the LLVM interpreter, and the dynamic compiler do. We will explore the lli tool in the next section.
Exploring the lli tool
Let’s try the lli tool with a very simple example. The following LLVM IR can be stored as a file called hello.ll, which is the equivalent of a C hello world application. This file declares a prototype for the printf() function from the C library. The hellostr constant contains the message to be printed. Inside the main() function, a call to the printf() function is generated, and this function contains a hellostr message that will be printed. The application always returns 0.
The complete source code is as follows:

declare i32 @printf(ptr, …)
@hellostr = private unnamed_addr constant [13 x i8] c”Hello world\0A\00″
define dso_local i32 @main(i32 %argc, ptr %argv) {
%res = call i32 (ptr, …) @printf(ptr @hellostr)
ret i32 0
}

This LLVM IR file is generic enough that it is valid for all platforms. We can directly execute the IR using the lli tool with the following command:

$ lli hello.ll
Hello world

The interesting point here is how the printf() function is found. The IR code is compiled to machine code, and a lookup for the printf symbol is triggered. This symbol is not found in the IR, so the current process is searched for it. The lli tool dynamically links against the C library, and the symbol is found there.
Of course, the lli tool does not link against the libraries you created. To enable the use of such functions, the lli tool supports the loading of shared libraries and objects. The following C source just prints a friendly message:

include
void greetings() {
puts(“Hi!”);
}

Stored in greetings.c, we use this to explore loading objects with lli. The following command will compile this source into a shared library. The –fPIC option instructs clang to generate position-independent code, which is required for shared libraries. Moreover, the compiler creates a greetings.so shared library with –shared:

$ clang greetings.c -fPIC -shared -o greetings.so

We also compile the file into the greetings.o object file:

$ clang greetings.c -c -o greetings.o

We now have two files, the greetings.so shared library and the greetings.o object file, which we will load into the lli tool.
We also need an LLVM IR file that calls the greetings() function. For this, create a main.ll file that contains a single call to the function:

declare void @greetings(…)
define dso_local i32 @main(i32 %argc, i8** %argv) {
call void (…) @greetings()
ret i32 0
}

Notice that on executing, the previous IR crashes, as lli cannot locate the greetings symbol:

$ lli main.ll
JIT session error: Symbols not found: [ _greetings ]
lli: Failed to materialize symbols: { (main, { _main }) }

The greetings() function is defined in an external file, and to fix the crash, we have to tell the lli tool which additional file needs to be loaded. In order to use the shared library, you must use the –load option, which takes the path to the shared library as an argument:

$ lli –load ./greetings.so main.ll
Hi!

It is important to specify the path to the shared library if the directory containing the shared library is not in the search path for the dynamic loader. If omitted, then the library will not be found.
Alternatively, we can instruct lli to load the object file with –extra-object:

$ lli –extra-object greetings.o main.ll
Hi!

Other supported options are –extra-archive, which loads an archive, and –extra-module, which loads another bitcode file. Both options require the path to the file as an argument.
You now know how you can use the lli tool to directly execute LLVM IR. In the next section, we will implement our own JIT tool.

25Mar

LLVM’s overall JIT implementation and use cases – JIT Compilation

by Nancy Rohan Creating a new TableGen tool, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

So far, we have only looked at ahead-of-time (AOT) compilers. These compilers compile the whole application. The application can only run after the compilation is finished. If the compilation is performed at the runtime of the application, then the compiler is a JIT compiler. A JIT compiler has interesting use cases:

Implementation of a virtual machine: A programming language can be translated to byte code with an AOT compiler. At runtime, a JIT compiler is used to compile the byte code to machine code. The advantage of this approach is that the byte code is hardware-independent, and thanks to the JIT compiler, there is no performance penalty compared to an AOT compiler. Java and C use this model today, but this is not a new idea: the USCD Pascal compiler from 1977 already used a similar approach.
Expression evaluation: A spreadsheet application can compile often-executed expressions with a JIT compiler. For example, this can speed up financial simulations. The lldb LLVM debugger uses this approach to evaluate source expressions at debug time.
Database queries: A database creates an execution plan from a database query. The execution plan describes operations on tables and columns, which leads to a query answer when executed. A JIT compiler can be used to translate the execution plan into machine code, which speeds up the execution of the query.

The static compilation model of LLVM is not as far away from the JIT model as one may think. The llc LLVM static compiler compiles LLVM IR into machine code and saves the result as an object file on disk. If the object file is not stored on disk but in memory, would the code be executable? Not directly, as references to global functions and global data use relocations instead of absolute addresses. Conceptually, a relocation describes how to calculate the address – for example, as an offset to a known address. If we resolve relocations into addresses, as the linker and the dynamic loader do, then we can execute the object code. Running the static compiler to compile IR code into an object file in memory, performing a link step on the in-memory object file, and running the code gives us a JIT compiler. The JIT implementation in the LLVM core libraries is based on this idea.

During the development history of LLVM, there were several JIT implementations, with different feature sets. The latest JIT API is the On-Request Compilation (ORC) engine. In case you were curious about the acronym, it was the lead developer’s intention to invent yet another acronym based on Tolkien’s universe, after Executable and Linking Format (ELF) and Debugging Standard (DWARF) were already present.

The ORC engine builds on and extends the idea of using the static compiler and a dynamic linker on the in-memory object file. The implementation uses a layered approach. The two basic levels are the compile layer and the link layer. On top of this sits a layer providing support for lazy compilation. A transformation layer can be stacked on top or below the lazy compilation layer, allowing the developer to add arbitrary transformations or simply to be notified of certain events. Moreover, this layered approach has the advantage that the JIT engine is customizable for diverse requirements. For example, a high-performance virtual machine may choose to compile everything upfront and make no use of the lazy compilation layer. On the other hand, other virtual machines will emphasize startup time and responsiveness to the user and will achieve this with the help of the lazy compilation layer.

The older MCJIT engine is still available, and its API is derived from an even older, already-removed JIT engine. Over time, this API gradually became bloated, and it lacks the flexibility of the ORC API. The goal is to remove this implementation, as the ORC engine now provides all the functionality of the MCJIT engine, and new developments should use the ORC API.

In the next section, we look at lli, the LLVM interpreter, and the dynamic compiler, before we dive into implementing a JIT compiler.

23Feb

Drawbacks of TableGen – The TableGen Language

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

Performance of the token filter

Using a plain binary search for the keyword filter does not give a better performance than the implementation based on the llvm::StringMap type. To beat the performance of the current implementation, you need to generate a perfect hash function.

The classic algorithm from Czech, Havas, and Majewski can be easily implemented, and it gives you a very good performance. It is described in An optimal algorithm for generating minimal perfect hash functions, Information Processing Letters, Volume 43, Issue 5, 1992. See https://www.sciencedirect.com/science/article/abs/pii/002001909290220P.

A state-of-the-art algorithm is PTHash from Pibiri and Trani, described in PTHash: Revisiting FCH Minimal Perfect Hashing, SIGIR ’21. See https://arxiv.org/pdf/2104.10402.pdf.

Both algorithms are good candidates for generating a token filter that is actually faster than llvm::StringMap.

Drawbacks of TableGen

Here are a few drawbacks of TableGen:

The TableGen language is built on a simple concept. As a consequence, it does not have the same computing capabilities as other DSLs. Obviously, some programmers would like to replace TableGen with a different, more powerful language, and this topic comes up from time to time in the LLVM discussion forum.
With the possibility of implementing your own backends, the TableGen language is very flexible. However, it also means that the semantics of a given definition are hidden inside the backend. Thus, you can create TableGen files that are basically not understandable by other developers.
And last, the backend implementation can be very complex if you try to solve a non-trivial task. It is reasonable to expect that this effort would be lower if the TableGen language were more powerful.

Even if not all developers are happy with the capabilities of TableGen, the tool is used widely in LLVM, and for a developer, it is important to understand it.

Summary

In this chapter, you first learned the main idea behind TableGen. Then, you defined your first classes and records in the TableGen language, and you acquired knowledge of the syntax of TableGen. Finally, you developed a TableGen backend emitting fragments of C++ source code, based on the TableGen classes you defined.

In the next chapter, we examine another unique feature of LLVM: generating and executing code in one step, also known as Just-In-Time (JIT) compilation.

5Nov

Creating a new TableGen tool – The TableGen Language-3

by Nancy Rohan Creating a new TableGen tool, Exams of IT, Extending the pass pipeline, ITCertification Exams, Simulating function calls

Next, the function declarations are emitted. This is only a constant string, so nothing exciting happens. This finishes emitting the declarations: OS << ” const char *getTokenName(TokenKind Kind) “
“LLVM_READNONE;\n”
<< ” const char *getPunctuatorSpelling(TokenKind “
“Kind) LLVM_READNONE;\n”
<< ” const char *getKeywordSpelling(TokenKind “
“Kind) “
“LLVM_READNONE;\n”
<< “}\n”
<< “endif\n”;
Now, let’s turn to emitting the definitions. Again, this generated code is guarded by a macro called GET_TOKEN_KIND_DEFINITION. First, the token names are emitted into a TokNames array, and the getTokenName() function uses that array to retrieve the name. Please note that the quote symbol must be escaped as \” when used inside a string: OS << “ifdef GET_TOKEN_KIND_DEFINITION\n”; OS << “undef GET_TOKEN_KIND_DEFINITION\n”; OS << “static const char * const TokNames[] = {\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Token”)) { OS << ” \”” << CC->getValueAsString(“Name”)
<< “\”,\n”;
}
OS << “};\n\n”;
OS << “const char *tok::getTokenName(TokenKind Kind) “
“{\n”
<< ” if (Kind <= tok::NUM_TOKENS)\n”
<< ” return TokNames[Kind];\n”
<< ” llvm_unreachable(\”unknown TokenKind\”);\n”
<< ” return nullptr;\n”
<< “};\n\n”;
Next, the getPunctuatorSpelling() function is emitted. The only notable difference to the other parts is that the loop goes over all records derived from the Punctuator class. Also, a switch statement is generated instead of an array: OS << “const char ” “*tok::getPunctuatorSpelling(TokenKind ” “Kind) {\n” << ” switch (Kind) {\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Punctuator”)) { OS << ” ” << CC->getValueAsString(“Name”)
<< “: return \”” << CC->getValueAsString(“Spelling”) << “\”;\n”;
}
OS << ” default: break;\n”
<< ” }\n”
<< ” return nullptr;\n”
<< “};\n\n”;
And finally, the getKeywordSpelling() function is emitted. The coding is similar to emitting getPunctuatorSpelling(). This time, the loop goes over all records of the Keyword class, and the name is again prefixed with kw_: OS << “const char *tok::getKeywordSpelling(TokenKind ” “Kind) {\n” << ” switch (Kind) {\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Keyword”)) { OS << ” kw_” << CC->getValueAsString(“Name”)
<< “: return \”” << CC->getValueAsString(“Name”)
<< “\”;\n”;
}
OS << ” default: break;\n”
<< ” }\n”
<< ” return nullptr;\n”
<< «};\n\n»;
OS << «endif\n»;
}
The emitKeywordFilter() method is more complex than the previous methods since emitting the filter requires collecting some data from the records. The generated source code uses the std::lower_bound() function, thus implementing a binary search.
Now, let’s make a shortcut. There can be several records of the TokenFilter class defined in the TableGen file. For demonstration purposes, just emit at most one token filter method: std::vector AllTokenFilter =
Records.getAllDerivedDefinitionsIfDefined(
“TokenFilter”);
if (AllTokenFilter.empty())
return;

18Oct

Creating a new TableGen tool – The TableGen Language-2

by Nancy Rohan Exploring the lli tool, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

The run() method calls all the emitting methods. It also times the length of each phase. You specify the –time-phases option, and then the timing is shown after all code is generated:

void TokenAndKeywordFilterEmitter::run(raw_ostream &OS) {
// Emit Flag fragments.
Records.startTimer(“Emit flags”);
emitFlagsFragment(OS);
// Emit token kind enum and functions.
Records.startTimer(“Emit token kind”);
emitTokenKind(OS);
// Emit keyword filter code.
Records.startTimer(“Emit keyword filter”);
emitKeywordFilter(OS);
Records.stopTimer();
}

The emitFlagsFragment() method shows the typical structure of a function emitting C++ source code. The generated code is guarded by the GET_TOKEN_FLAGS macro. To emit the C++ source fragment, you loop over all records that are derived from the Flag class in the TableGen file. Having such a record, it is easy to query the record for the name and the value. Please note that the names Flag, Name, and Val must be written exactly as in the TableGen file. If you rename Val to Value in the TableGen file, then you also need to change the string in this function. All the generated source code is written to the provided stream, OS:

void TokenAndKeywordFilterEmitter::emitFlagsFragment(
raw_ostream &OS) {
OS << “ifdef GET_TOKEN_FLAGS\n”; OS << “undef GET_TOKEN_FLAGS\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Flag”)) { StringRef Name = CC->getValueAsString(“Name”);
int64_t Val = CC->getValueAsInt(“Val”);
OS << Name << ” = ” << format_hex(Val, 2) << “,\n”;
}
OS << “endif\n”;
}

The emitTokenKind() method emits a declaration and definition of token classification functions. Let’s have a look at emitting the declarations first. The overall structure is the same as the previous method – only more C++ source code is emitted. The generated source fragment is guarded by the GET_TOKEN_KIND_DECLARATION macro. Please note that this method tries to generate nicely formatted C++ code, using new lines and indentation as a human developer would do. In case the emitted source code is not correct, and you need to examine it to find the error, this will be tremendously helpful. It is also easy to make such errors: after all, you are writing a C++ function that emits C++ source code.
First, the TokenKind enumeration is emitted. The name for a keyword should be prefixed with a kw_ string. The loop goes over all records of the Token class, and you can query the records if they are also a subclass of the Keyword class, which enables you to emit the prefix: OS << “ifdef GET_TOKEN_KIND_DECLARATION\n” << “undef GET_TOKEN_KIND_DECLARATION\n” << “namespace tok {\n” << ” enum TokenKind : unsigned short {\n”; for (Record *CC : Records.getAllDerivedDefinitions(“Token”)) { StringRef Name = CC->getValueAsString(“Name”);
OS << ” “; if (CC->isSubClassOf(“Keyword”))
OS << “kw_”;
OS << Name << “,\n”;
}
OS << „ NUM_TOKENS\n”
<< „ };\n”;

27Jul

Generating C++ code from a TableGen file – The TableGen Language

by Nancy Rohan Exams of IT, Exploring the lli tool, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

In the previous section, you defined records in the TableGen language. To make use of those records, you need to write your own TableGen backend that can produce C++ source code or do other things using the records as input.

In Chapter 3, Turning the Source File into an Abstract Syntax Tree, the implementation of the Lexer class uses a database file to define tokens and keywords. Various query functions make use of that database file. Besides that, the database file is used to implement a keyword filter. The keyword filter is a hash map, implemented using the llvm::StringMap class. Whenever an identifier is found, the keyword filter is called to find out if the identifier is actually a keyword. If you take a closer look at the implementation using the ppprofiler pass from Chapter 6, Advanced IR Generation, then you will see that this function is called quite often. Therefore, it may be useful to experiment with different implementations to make that functionality as fast as possible.

However, this is not as easy as it seems. For example, you can try to replace the lookup in the hash map with a binary search. This requires that the keywords in the database file are sorted. Currently, this seems to be the case, but during development, a new keyword might be added in the wrong place undetected. The only way to make sure that the keywords are in the right order is to add some code that checks the order at runtime.

You can speed up the standard binary search by changing the memory layout. For example, instead of sorting the keywords, you can use the Eytzinger layout, which enumerates the search tree in breadth-first order. This layout increases the cache locality of the data and therefore speeds up the search. Personally speaking, maintaining the keywords in breadth-first order manually in the database file is not possible.

Another popular approach for searching is the generation of minimal perfect hash functions. If you insert a new key into a dynamic hash table such as llvm::StringMap, then that key might be mapped to an already occupied slot. This is called a key collision. Key collisions are unavoidable, and many strategies have been developed to mitigate that problem. However, if you know all the keys, then you can construct hash functions without key collisions. Such hash functions are called perfect. In case they do not require more slots than keys, then they are called minimal. Perfect hash functions can be generated efficiently – for example, with the gperf GNU tool.

In summary, there is some incentive to be able to generate a lookup function from keywords. So, let’s move the database file to TableGen!

24May

Simulating function calls – The TableGen Language

by Nancy Rohan Extending the pass pipeline, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

In some cases, using a multiclass like in the previous example can lead to repetitions. Assume that the CPU also supports memory operands, in a way similar to immediate operands. You can support this by adding a new record definition to the multiclass:

multiclass InstWithOps {
def “”: Inst;
def “I”: Inst;
def “M”: Inst;
}

This is perfectly fine. But now, imagine you do not have 3 but 16 records to define, and you need to do this multiple times. A typical scenario where such a situation can arise is when the CPU supports many vector types, and the vector instructions vary slightly based on the used type.
Please note that all three lines with the def statement have the same structure. The variation is only in the suffix of the name and of the mnemonic, and the delta value is added to the opcode. In C, you could put the data into an array and implement a function that returns the data based on an index value. Then, you could create a loop over the data instead of manually repeating statements.
Amazingly, you can do something similar in the TableGen language! Here is how to transform the example:

To store the data, you define a class with all required fields. The class is called InstDesc, because it describes some properties of an instruction:

class InstDesc {
string Name = name;
string Suffix = suffix;
int Delta = delta;
}

Now, you can define records for each operand type. Note that it exactly captures the differences observed in the data:

def RegOp : InstDesc<“”, “”, 0>;
def ImmOp : InstDesc<“I”, “””, 1>;
def MemOp : InstDesc””””,””””, 2>;

Imagine you have a loop enumerating the numbers 0, 1, and 2, and you want to select one of the previously defined records based on the index. How can you do this? The solution is to create a getDesc class that takes the index as a parameter. It has a single field, ret, that you can interpret as a return value. To assign the correct value to this field, the !cond operator is used:

class getDesc {
InstDesc ret = !cond(!eq(n, 0) : RegOp,
!eq(n, 1) : ImmOp,
!eq(n, 2) : MemOp);
}

This operator works similarly to a switch/case statement in C.

Now, you are ready to define the multiclass. The TableGen language has a loop statement, and it also allows us to define variables. But remember that there is no dynamic execution! As a consequence, the loop range is statically defined, and you can assign a value to a variable, but you cannot change that value later. However, this is enough to retrieve the data. Please note how the use of the getDesc class resembles a function call. But there is no function call! Instead, an anonymous record is created, and the values are taken from that record. Lastly, the past operator () performs a string concatenation, similar to the !strconcat operator used earlier:

multiclass InstWithOps {
foreach I = 0-2 in {
defvar Name = getDesc.ret.Name;
defvar Suffix = getDesc.ret.Suffix;
defvar Delta = getDesc.ret.Delta;
def Name: Inst;
}
} Now, you use the multiclass as before to define records: defm ADD : InstWithOps<“add”, 0xA0>; Please run llvm-tblgen and examine the records. Besides the various ADD records, you will also see a couple of anonymous records generated by the use of the getDesc class.
This technique is used in the instruction definition of several LLVM backends. With the knowledge you have acquired, you should have no problem understanding those files.
The foreach statement used the syntax 0-2 to denote the bounds of the range. This is called a range piece. An alternative syntax is to use three dots (0…3), which is useful if the numbers are negative. Lastly, you are not restricted to numerical ranges; you can also loop over a list of elements, which allows you to use strings or previously defined records. For example, you may like the use of the foreach statement, but you think that using the getDesc class is too complicated. In this case, looping over the InstDesc records is the solution: multiclass InstWithOps {
foreach I = [RegOp, ImmOp, MemOp] in {
defvar Name = I.Name;
defvar Suffix = I.Suffix;
defvar Delta = I.Delta;
def Name: Inst;
}
} So far, you only defined records in the TableGen language, using the most commonly used statements. In the next section, you’ll learn how to generate C++ source code from records defined in the TableGen language.

11Mar

Creating multiple records at once with multiclasses – The TableGen Language

by Nancy Rohan Exploring the lli tool, Extending the pass pipeline, ITCertification Exams, Simulating function calls

Another often-used statement is multiclass. A multiclass allows you to define multiple records at once. Let’s expand the example to show why this can be useful.
The definition of an add instruction is very simplistic. In reality, a CPU often has several add instructions. A common variant is that one instruction has two register operands while another instruction has one register operand and an immediate operand, which is a small number. Assume that for the instruction having an immediate operand, the designer of the instruction set decided to mark them with i as a suffix. So, we end up with the add and addi instructions. Further, assume that the opcodes differ by 1. Many arithmetic and logical instructions follow this scheme; therefore, you want the definition to be as compact as possible.
The first challenge is that you need to manipulate values. There is a limited number of operators that you can use to modify a value. For example, to produce the sum of 1 and the value of the field opcode, you write:

!add(opcode, 1)

Such an expression is best used as an argument for a class. Testing a field value and then changing it based on the found value is generally not possible because it requires dynamic statements that are not available. Always remember that all calculations are done while the records are constructed!
In a similar way, strings can be concatenated:

!strconcat(mnemonic,”i”)

Because all operators begin with an exclamation mark (!), they are also called bang operators. You find a full list of bang operators in the Programmer’s Reference: https://llvm.org/docs/TableGen/ProgRef.htmlappendix-a-bang-operators.
Now, you can define a multiclass. The Inst class serves again as the base:

class Inst {
string Mnemonic = mnemonic;
int Opcode = opcode;
}

The definition of a multiclass is a bit more involved, so let’s do it in steps:

The definition of a multiclass uses a similar syntax to classes. The new multiclass is named InstWithImm and has two parameters, mnemonic and opcode:

multiclass InstWithImm {

First, you define an instruction with two register operands. As in a normal record definition, you use the def keyword to define the record, and you use the Inst class to create the record content. You also need to define an empty name. We will explain later why this is necessary: def “”: Inst;
Next, you define an instruction with the immediate operand. You derive the values for the mnemonic and the opcode from the parameters of the multiclass, using bang operators. The record is named I: def I: Inst;
That is all; the class body can be closed, like so:

}

To instantiate the records, you must use the defm keyword:

defm ADD : InstWithImm<“add”, 0xA0>;

These statements result in the following:

The Inst<“add”, 0xA0> record is instantiated. The name of the record is the concatenation of the name following the defm keyword and of the name following def inside the multiclass statement, which results in the name ADD.
The Inst<“addi”, 0xA1> record is instantiated and, following the same scheme, is given the name ADDI.
Let’s verify this claim with llvm-tblgen:

$ llvm-tblgen –print-records inst.td
————- Classes —————–
class Inst {
string Mnemonic = Inst:mnemonic;
int Opcode = Inst:opcode;
}
————- Defs —————–
def ADD { // Inst
string Mnemonic = “add”;
int Opcode = 160;
}
def ADDI { // Inst
string Mnemonic = “addi”;
int Opcode = 161;
}

Using a multiclass, it is very easy to generate multiple records at once. This feature is used very often!
A record does not need to have a name. Anonymous records are perfectly fine. Omitting the name is all you need to do to define an anonymous record. The name of a record generated by a multiclass is made up of two names, and both names must be given to create a named record. If you omit the name after defm, then only anonymous records are created. Similarly, if the def inside the multiclass is not followed by a name, an anonymous record is created. This is the reason why the first definition in the multiclass example used the empty name “”: without it, the record would be anonymous.

27Aug

Extending the pass pipeline – Optimizing IR

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, ITCertification Exams, Simulating function calls, TIP FOR DEBUGGING

In the previous section, we used the PassBuilder class to create a pass pipeline, either from a user-provided description or a predefined name. Now, let’s look at another way to customize the pass pipeline: using extension points.
During the construction of the pass pipeline, the pass builder allows passes contributed by the user to be added. These places are called extension points. A couple of extension points exist, as follows:
• The pipeline start extension point, which allows us to add passes at the beginning of the pipeline
• The peephole extension point, which allows us to add passes after each instance of the instruction combiner pass
Other extension points exist too. To employ an extension point, you must register a callback. During the construction of the pass pipeline, your callback is run at the defined extension point and can add passes to the given pass manager.
To register a callback for the pipeline start extension point, you must call the registerPipelineStartEPCallback() method of the PassBuilder class. For example, to add our PPProfiler pass to the beginning of the pipeline, you would adapt the pass to be used as a module pass with a call to the createModuleToFunctionPassAdaptor() template function and then add the pass to the module pass manager:

PB.registerPipelineStartEPCallback(
[](ModulePassManager &MPM) {
MPM.addPass(PPProfilerIRPass());
});

You can add this snippet in the pass pipeline setup code anywhere before the pipeline is created – that is, before the parsePassPipeline() method is called.
A very natural extension to what we did in the previous section is to let the user pass a pipeline description for an extension point on the command line. The opt tool allows this too. Let’s do this for the pipeline start extension point. Add the following code to the tools/driver/Driver.cpp file:

First, we must a new command line for the user to specify the pipeline description. Again, we take the option name from the opt tool:

static cl::opt PipelineStartEPPipeline(
“passes-ep-pipeline-start”,
cl::desc(“Pipeline start extension point));

Using a Lambda function as a callback is the most convenient way to do this. To parse the pipeline description, we must call the parsePassPipeline() method of the PassBuilder instance. The passes are added to the PM pass manager and given as an argument to the Lambda function. If an error occurs, we only print an error message without stopping the application. You can add this snippet after the call to the crossRegisterProxies() method: PB.registerPipelineStartEPCallback(
[&PB, Argv0](ModulePassManager &PM) {
if (auto Err = PB.parsePassPipeline(
PM, PipelineStartEPPipeline)) {
WithColor::error(errs(), Argv0)
<< “Could not parse pipeline “
<< PipelineStartEPPipeline.ArgSt
r << “: “
<< toString(std::move(Err)) << “\n”;
}
});

Category Simulating function calls