UEFI News and Commentary

Thursday, August 01, 2013

Writing an IFR Assembler for UEFI - Part 3

This is Part 3 of Writing an IFR Assembler for UEFI.

This article discusses the IFR Assembler set up in Part 1. To set up the IFR Assembler, follow the instructions in Part 1.
To learn about the higher-level parsing process of the IFR Assembler, view Part 2.

This post will walk you through the token-parsing process of the IFR Assembler.

Understanding Token Parsing

The last article talked about the higher-level parsing system and the function that the IFR Assembler uses to parse each item. Token-parsing is the second part of the parsing system which goes through each line of code in the .pl file and parses each significant syntactic element.

A significant syntactic element is any part of the .pl file that is part of the actual code. This includes op-codes, punctuation, expressions, and many other parts of the code. Items of code that would not be considered significant syntactic element include comments and white space. The parser reads these significant syntactic elements, or tokens, on behalf of the higher-level parsing functions.

The Parsing Process

The first part of token-parsing is finding the inputs to the parser. There are three main variables that must be defined. The first two variables are SourceFileName and SourceFileLine, both of which are used to give an error's location when an error is printed. The final input of token-parsing is the variable psz. psz points to a character within the code, usually the first character in a token, and is used to find out what token is currently being pointed to.

These three inputs are used mainly in the function tokenP(), which is within the source file Parse.c. tokenP() reads what character psz is pointing to, and uses the characters to figure out which token psz is currently pointing to. After the type of token is discovered, tokenP() sets the variable t to equal that token. After t is set, psz is changed to point to the first character of the next token in the series.

Because t is a numeric value, every token has a numeric value connected to it so that t can represent that token. Each of these values is defined in the source file Token.h.

Each time that tokenP() is called, it will return a token in the form of the variable t. These tokens are passed to the high-parsing functions discussed in Part 2, which check the syntax of the code.

If the token is a GUID (T_P_UUID), an unsigned integer (T_P_UINT), an ASCII string (T_P_STRA), the value of the token is put into tu, tguid, tstrA, respectively. These variables are then returned to the higher-level parsing functions along with t. The functions InitParse() and ShutParse() help to set up and empty out these variables, namely tstrA and tstrW, before and after the parsing of a token.

tokenP() calls tokenNL() when psz reaches the end of a line of code. tokenNL() points psz to the next line of code and increments SourceFileLine.

The function backslash() allows strings to use escape sequences.
The utility function dispP() prints out a different string for each token value in Token.h. dispP() is mainly used to write out the name of the token when an error is printed.
parseguid() is used to parse GUIDs that are written in the following formats: 

{ 0x49adf016, 0x4177, 0x48ae, { 0xb2, 0x55, 0x92, 0x90, 0xa8, 0x20, 0x5c, 0x9d } }
 { 0x49adf016, 0x4177, 0x48ae, 0xb2, 0x55, 0x92, 0x90, 0xa8, 0x20, 0x5c, 0x9d }
The table below describes what each function token-specific parsing function does.
 The token-parser breaks down a series of tokens so that the higher-level parsing functions can check the syntax of the code. The next article will teach you how expressions are parsed and written.


No comments: