UEFI News and Commentary

Thursday, October 15, 2009

UEFI HII (Part 4): Strings

Up to this point, we've discussed the HII Database and how individual drivers can contribute resources (strings, fonts, images, forms, etc.) in the form of packages to that database. Groups of packages are called package lists. Later the form browser extracts these package lists from the database to use in constructing the user interface for platform configuration and other user-interface tasks.

One type of package that drivers can contribute to the HII Database is the strings package. A string package is a collection of strings associated with a specific language. Each string has a number (called the string identifier) that uniquely identifies it within the package list, and default font information, such as font name, size and style.

Languages.
Each string package only contains text in one language. For example, one string package contains English (U.S.), another Korean, another Pilipino and another French. By separating the languages into separate string packages, it is easy to delete support for a particular language: just delete the associated package.

The UEFI firmware maintains the system user-interface language information in two special EFI variables:
  1. PlatformLangCodes. The list of languages that the platform supports.
  2. PlatformLang. The current platform language.
These EFI variables are also used by other UEFI protocols, such as the EFI Driver Configuration protocol, the EFI Driver Diagnostics protocol, the Component Name protocol and the Unicode Collation protocol.

Languages are encoded according to RFC 4646, which specifies two and three letter codes for each language, along with additional modifiers representing a specific geographic location. For example:
  • en-US (English, United States)
  • fr (French)
  • fr-FR (French, France)
  • zh-CN (Chinese, mainland China)
  • sr-CS (Serbian, Serbia/Montenegro)
The rules by which a form browser might substitute an alternate language (say, Portuguese from Portugal if there was no Portuguese from Brazil, or English if there was no French), is specific to the implementation.

It is possible to find out which languages are supported by iterating through all of the package lists in the system (using ListPackageLists()) and then using GetLanguages() and GetSecondaryLanguages().

Fonts
Each string is associated with a specific font family, font size and font style. The form browser may choose to use this information, ignore it completely or substitute a similar but different font. So this font information might be considered more of a suggestion, rather than a command. In many cases, firmware implementations may use this information to cull the fonts that are included in the BIOS ROM (for space reasons) so that the font only contains the characters used.

The HII-related protocols (such as HII Font or HII String) will use the font information associated with the string to select the display font unless an alternate is provided by the caller.

There are three font attributes associated with each string.
  • Font Name. The font name the family of font (Arial, Helvetica, Times Roman, Courier), which identifies, in broad terms, the visual style of the font.  
  • Font Size. This is the cell height, in pixels. To give some perspective, the "standard" UEFI font size is 19 pixels high.
  • Font Style. The font style indicates how the basic font should be modified. The following styles can be described: bold, italic, emboss, outline, shadow, underline and double-underline.
If the form browser does not have access to the exact font specified by a string, it might substitute a different font or it might synthesize using an algorithm. An example of substitution would be using Helvetica instead of Arial. An example of sythesizing would be if a doubled 12-size font were used for a 24-size font or if the italic style were simulated by shifting the successive lines of a glyph over by one pixel so that it would slant.

Identifiers
Strings are uniquely identified in the system by a string identifier (EFI_STRING_ID), a package list handle (EFI_HII_HANDLE) and a language. If the system's display language is English ('en-US') and you ask for string #1, you would get string #1 from the English string package. If the system's display language is French ('fr') and you ask for string #1, you would get string #1 from the French string package. Likewise, for Japanese ('jp'). As a rule, UEFI drivers don't hand around pointers to null-terminated strings. Instead, they pass around string identifiers and package list handles.

Can you actually examine and modify the text? Of course. GetString() retrieves the actual text and SetString() lets you modify it. NewString() let's you create a brand new string, with a unique string identifier.

Encoding
Each string package begins with the standard header (EFI_HII_PACKAGE_HEADER) with the Type set to EFI_HII_PACKAGE_STRINGS. Following the standard header is the string-package-specific header:

typedef struct _EFI_HII_STRING_PACKAGE_HDR {

  EFI_HII_PACKAGE_HEADER Header;
  UINT32 HdrSize;
  UINT32 StringInfoOffset;
  CHAR16 LanguageWindow[16];
  EFI_STRING_ID LanguageName;
//CHAR8 Language[ … ];
} EFI_HII_STRING_PACKAGE_HDR; 

The actual string data begins at StringInfoOffset bytes from the start of this structure. The LanguageWindow array is used for setting up the default "windows" used for the compression algorithm. More on this later. Language is the RFC 4646 null-termined language string which identifies which language this package is for. For example, "en-US" or "fr" or "jp". LanguageName is the string identifier of the string that gives the user-readable name of the language this package is for. For example, "English" or "French" or "Japanese" These can be used when presenting choices to a user.

The string information consists of a series of records, which can be broken down into three categories:
  1. String Records. These records assign the current string identifier value to specific string text .
  2. Identifier Records. These records change the current string identifier value.
  3. Font Records. These records describe the fonts used by later strings.
String Records
String records are broken down into three types:
  1. Use compressed text or uncompressed text. Text can be compressed using the Standard Compression Scheme for Unicode (SCSU), which is described in Unicode's Technical Report #6. Optimized for reducing the number of bytes required to describe Korean, Japanese and Chinese characters, this scheme uses the concept of "windows" of 127 characters than can be selected for a sub-string of characters. The default settings for these windows are specified in the report. But they can also be optimized for the exact strings in the package by altering the values in the LanguageWindow array in the header. Uncompressed text is simply listed as a null-termined UCS-2 string.
  2. List single string or multiple strings. Strings which have string identifiers which are sequential can be listed in a single record. Or a single string can be listed.
  3. Use the default font or a specific font. Strings which use the default font require fewer bytes to encode because the font is implied.
  4. Use text provided or duplicate text. One of the record types simply implies that the text for the new string is a copy of the text for a string which was previously defined.
As mentioned before, strings are associated with a specific font. However, the fonts can be changed in the middle of a string using a series of special control characters. The character values used are marked as "implementation-specific" in the Unicode specification:

For example, characters 0xF7xx (where xx is the font identifier assigned by a font record, below) can be used to switch fonts. Characters 0xF8xx (where xx is the font size) can be used to change just the font's size. Characters 0xF620 and 0xF621 turn bold on and off. Characters 0xF622 and 0xF623 turn italic on and off. Characters 0xF624 and 0xF625 turn underline on and off. Characters 0xF626 and 0xF627 turn emboss on and off. Characters 0xF628 and 0xF629 turn outline on and off. Characters 0xF62A and 0xF62B turn double underline on and off.

Identifier Records
Identifier records are used to adjust the current string identifier value without assigning any string text. This can be useful when there are gaps in the string identifiers. When processing the string records, the current string identifier is always set to 1 and is incremented each time a string record is processed. So, normally, the first string is assigned identifier #1, the second #2, etc. But if the identifiers are not sequential (i.e. 1,2, 16) you can use a skip record so that after the second string, you just skip the next 13 identifiers.

Font Records
The font records must appear before the first instance of a string that uses them. The exception for this is the "default" font which is initialize to the system's default font. Each font is assigned a number that is only valid within the package, starting with 0 (for the default font) and going upwards. That means there is a theoretical maximum of 256 fonts used wihtin a string package. In addition to the identifier, each font has the usual attributes (name, size, style).

Conclusion
Strings are an important part of any user-interface. The ability and the flexibility to display strings in multiple languages, using a variety of font styles, sizes and families is important in making a rich user interface.

Next time, we begin to delve into the wicked world of HII fonts.

No comments: