August 2023 – Taking LLVM to the Next Level

21Aug

Defining data in the TableGen language – The TableGen Language

by Nancy Rohan Creating a new TableGen tool, Drawbacks of TableGen, Extending the pass pipeline, ITCertification Exams

The TokenKinds.def database file defines three different macros. The TOK macro is used for tokens that do not have a fixed spelling – for example, for integer literals. The PUNCTUATOR macro is used for all kinds of punctuation marks and includes a preferred spelling. Lastly, the KEYWORD macro defines a keyword that is made up of a literal and a flag, which is used to indicate at which language level this literal is a keyword. For example, the thread_local keyword was added to C++11.
One way to express this in the TableGen language is to create a Token class that holds all the data. You can then add subclasses of that class to make the usage more comfortable. You also need a Flag class for flags defined together with a keyword. And last, you need a class to define a keyword filter. These classes define the basic data structure and can be potentially reused in other projects. Therefore, you create a Keyword.td file for it. Here are the steps:

A flag is modeled as a name and an associated value. This makes it easy to generate an enumeration from this data:

class Flag {
string Name = name;
int Val = val;
}

The Token class is used as the base class. It just carries a name. Please note that this class has no parameters:

class Token {
string Name;
}

The Tok class has the same function as the corresponding TOK macro from the database file. it represents a token without fixed spellings. It derives from the base class, Token, and just adds initialization for the name:

class Tok : Token {
let Name = name;
}

In the same way, the Punctuator class resembles the PUNCTUATOR macro. It adds a field for the spelling of the token:

class Punctuator : Token {
let Name = name;
string Spelling = spelling;
}

And last, the Keyword class needs a list of flags:

class Keyword flags> : Token {
let Name = name;
list Flags = flags;
}

With these definitions in place, you can now define a class for the keyword filter, called TokenFilter. It takes a list of tokens as a parameter:

class TokenFilter tokens> {
string FunctionName;
list Tokens = tokens;
}

With these class definitions, you are certainly able to capture all the data from the TokenKinds.def database file. The TinyLang language does not utilize the flags, since there is only this version of the language. Real-world languages such as C and C++ have undergone a couple of revisions, and they usually require flags. Therefore, we use keywords from C and C++ as an example. Let’s create a KeywordC.td file, as follows:

First, you include the class definitions created earlier:

Include “Keyword.td”

Next, you define flags. The value is the binary value of the flag. Note how the !or operator is used to create a value for the KEYALL flag:

def KEYC99 : Flag<“KEYC99”, 0x1>;
def KEYCXX : Flag<“KEYCXX”, 0x2>;
def KEYCXX11: Flag<“KEYCXX11”, 0x4>;
def KEYGNU : Flag<“KEYGNU”, 0x8>;
def KEYALL : Flag<“KEYALL”, !or(KEYC99.Val, KEYCXX.Val, KEYCXX11.Val , KEYGNU.Val)>;

There are tokens without a fixed spelling – for example, a comment:

def : Tok<“comment”>;

Operators are defined using the Punctuator class, as in this example:

def : Punctuator<“plus”, “+”>;
def : Punctuator<“minus”, “-“>;

Keywords need to use different flags:

def kw_auto: Keyword<“auto”, [KEYALL]>;
def kw_inline: Keyword<“inline”, [KEYC99,KEYCXX,KEYGNU]>;
def kw_restrict: Keyword<“restrict”, [KEYC99]>;

And last, here’s the definition of the keyword filter:

def : TokenFilter<[kw_auto, kw_inline, kw_restrict]>;

Of course, this file does not include all tokens from C and C++. However, it demonstrates all possible usages of the defined TableGen classes.
Based on these TableGen files, you’ll implement a TableGen backend in the next section.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Archives August 2023