The CSS parser of Firefox

The main file for studying the CSS parser of Firefox (latest browser's release is 4, which corresponds to the Mozilla 2.0 branch) is located at http://mxr.mozilla.org/mozilla2.0/source/layout/style/nsCSSScanner.cpp, as a part of the style component of the layout engine. In its pure, outstanding precision, everything a web developer must know on CSS parsing is there. You will learn how a parser makes up its lexical scanner by providing a set of allowed tokens (according to the LEX notation of the CSS grammar), thus breaking down tokens in significant and non-significant, according to the context of parsing. You will learn how from a single flow of tokens everything will be ordered by following every single character contained in a style sheet.

You will then learn how whitespace is handled by separating its various components (space, carriage return, form feed, tab, new line) and then reintegrating them in a single entity:

There are four types of newlines in CSS: "\r", "\n", "\r\n", and "\f". To simplify dealing with newlines, they are all normalized to "\n" here

lines 630-31

You will learn how the parser "eats" CSS comments and, for backward compatibility reasons, even HTML comments that may appear in a CSS file. More important, you will see how CSS selectors are recognized by the presence of some special delimiters (like space, colons, dashes and so on) and how the parser handles them. Finally, you will learn how the parser recovers the main flow from parsing errors.

A good example of this is how the parser handles URLs contained in a CSS property's value:

Process a url lexical token. A CSS1 url token can contain characters beyond identifier characters (e.g. '/', ':', etc.) Because of this the normal rules for tokenizing the input don't apply very well. To simplify the parser and relax some of the requirements on the scanner we parse url's here. If we find a malformed URL then we emit a token of type "InvalidURL" so that the CSS1 parser can ignore the invalid input. The parser must treat an InvalidURL token like a Function token, and process tokens until a matching parenthesis.

lines 882-90

I recommend you to study this code with an open mind, enjoying the magic cohesion that holds together this simple atom of the vast structure of Firefox. There's much to learn from this. First, you'll get a better comprehension of the CSS standard. Second, you'll appreciate more the work of browser implementors. Finally, you'll be able to see what happens behind the scenes of a browser's mind.

2 thoughts on “The CSS parser of Firefox”

Leave a Reply

Note: Only a member of this blog may post a comment.