Here's a quotation taken from the nsCSSScanner.cpp
of Firefox:
Process a url lexical token. A CSS1 url token can contain characters beyond identifier characters (e.g. '/', ':', etc.) Because of this the normal rules for tokenizing the input don't apply very well. To simplify the parser and relax some of the requirements on the scanner we parse url's here. If we find a malformed URL then we emit a token of type "InvalidURL" so that the CSS1 parser can ignore the invalid input. We attempt to eat the right amount of input data when an invalid URL is presented.
Basically, CSS URLs take form within the url()
function, that can be applied to background properties,
list properties and generated content properties. Everything inside the parentheses is considered a URL. Anyway, as stated
above, parsing CSS URIs can be quite challenging, because the tokenization must include new types of characters which
are not included by default in the Flex notation for IDENTs of the CSS specifications. Further, URLs may either be contained within
quotes or not, which adds an additional level of complexity (e.g. checking if quotes occur in matching pairs, like
"..."). Here's how Firefox copes with this:
aToken.mType = eCSSToken_InvalidURL; nsString& ident = aToken.mIdent; ident.SetLength(0); if (ch == ')') { Pushback(ch); // empty url spec; just get out of here aToken.mType = eCSSToken_URL; } else { // start of a non-quoted url Pushback(ch); PRBool ok = PR_TRUE; for (;;) { ch = Read(aErrorCode); if (ch < 0) break; if (ch == CSS_ESCAPE) { ch = ParseEscape(aErrorCode); if (0 < ch) { ident.Append(PRUnichar(ch)); } } else if ((ch == '"') || (ch == '\'') || (ch == '(')) { // This is an invalid URL spec ok = PR_FALSE; } else if ((256 > ch) && ((gLexTable[ch] & IS_WHITESPACE) != 0)) { // Whitespace is allowed at the end of the URL (void) EatWhiteSpace(aErrorCode); if (LookAhead(aErrorCode, ')')) { Pushback(')'); // leave the closing symbol // done! break; } // Whitespace is followed by something other than a // ")". This is an invalid url spec. ok = PR_FALSE; } else if (ch == ')') { Unread(); // All done break; } else { // A regular url character. ident.Append(PRUnichar(ch)); } } // If the result of the above scanning is ok then change the token // type to a useful one. if (ok) { aToken.mType = eCSSToken_URL; } } } return PR_TRUE; }
Parsing starts with a '(' token and ends with a ')' token. Firefox checks if:
- the
url()
function is empty - the
url()
function contains a quoted or a non-quoted URL - the
url()
function contains invalid tokens (e.g. a '(' token) - the
url()
function contains whitespace, which is allowed at the end of the URL (Firefox removes/eats it, however) - the
url()
function is followed but other tokens after the whitespace (this is invalid)
Safari has a problem with nested matching quotes within the url()
function, i.e. it accepts them instead
of marking them as invalid. This is not the case of Firefox, as you can see.