How Firefox parses CSS URLs

Here's a quotation taken from the nsCSSScanner.cpp of Firefox:

Process a url lexical token. A CSS1 url token can contain characters beyond identifier characters (e.g. '/', ':', etc.) Because of this the normal rules for tokenizing the input don't apply very well. To simplify the parser and relax some of the requirements on the scanner we parse url's here. If we find a malformed URL then we emit a token of type "InvalidURL" so that the CSS1 parser can ignore the invalid input. We attempt to eat the right amount of input data when an invalid URL is presented.

Basically, CSS URLs take form within the url() function, that can be applied to background properties, list properties and generated content properties. Everything inside the parentheses is considered a URL. Anyway, as stated above, parsing CSS URIs can be quite challenging, because the tokenization must include new types of characters which are not included by default in the Flex notation for IDENTs of the CSS specifications. Further, URLs may either be contained within quotes or not, which adds an additional level of complexity (e.g. checking if quotes occur in matching pairs, like "..."). Here's how Firefox copes with this:

aToken.mType = eCSSToken_InvalidURL;
    nsString& ident = aToken.mIdent;
    ident.SetLength(0);

    if (ch == ')') {
      Pushback(ch);
      // empty url spec; just get out of here
      aToken.mType = eCSSToken_URL;
    } else {
      // start of a non-quoted url
      Pushback(ch);
      PRBool ok = PR_TRUE;
      for (;;) {
        ch = Read(aErrorCode);
        if (ch < 0) break;
        if (ch == CSS_ESCAPE) {
          ch = ParseEscape(aErrorCode);
          if (0 < ch) {
            ident.Append(PRUnichar(ch));
          }
        } else if ((ch == '"') || (ch == '\'') || (ch == '(')) {
          // This is an invalid URL spec
          ok = PR_FALSE;
        } else if ((256 > ch) && ((gLexTable[ch] & IS_WHITESPACE) != 0)) {
          // Whitespace is allowed at the end of the URL
          (void) EatWhiteSpace(aErrorCode);
          if (LookAhead(aErrorCode, ')')) {
            Pushback(')');  // leave the closing symbol
            // done!
            break;
          }
          // Whitespace is followed by something other than a
          // ")". This is an invalid url spec.
          ok = PR_FALSE;
        } else if (ch == ')') {
          Unread();
          // All done
          break;
        } else {
          // A regular url character.
          ident.Append(PRUnichar(ch));
        }
      }

      // If the result of the above scanning is ok then change the token
      // type to a useful one.
      if (ok) {
        aToken.mType = eCSSToken_URL;
      }
    }
  }
  return PR_TRUE;
}

Parsing starts with a '(' token and ends with a ')' token. Firefox checks if:

  • the url() function is empty
  • the url() function contains a quoted or a non-quoted URL
  • the url() function contains invalid tokens (e.g. a '(' token)
  • the url() function contains whitespace, which is allowed at the end of the URL (Firefox removes/eats it, however)
  • the url() function is followed but other tokens after the whitespace (this is invalid)

Safari has a problem with nested matching quotes within the url() function, i.e. it accepts them instead of marking them as invalid. This is not the case of Firefox, as you can see.

Leave a Reply

Note: Only a member of this blog may post a comment.