The root element in HTML 5 and XML parsing

I was simply struck by the fact that in HTML 5 the html root element is optional, so I found this note in the HTML 5 specs:

The term root element, when not explicitly qualified as referring to the document's root element, means the furthest ancestor element node of whatever node is being discussed, or the node itself if it has no ancestors. When the node is a part of the document, then the node's root element is indeed the document's root element; however, if the node is not currently part of the document tree, the root element will be an orphaned node.

When an element's root element is the root element of a Document, it is said to be in a Document. An element is said to have been inserted into a document when its root element changes and is now the document's root element. Analogously, an element is said to have been removed from a document when its root element changes from being the document's root element to being another element.

A node's home subtree is the subtree rooted at that node's root element. When a node is in a Document, its home subtree is that Document's tree.

The Document of a Node (such as an element) is the Document that the Node's ownerDocument IDL attribute returns. When a Node is in a Document then that Document is always the Node's Document, and the Node's ownerDocument IDL attribute thus always returns that Document.

The term tree order means a pre-order, depth-first traversal of DOM nodes involved (through the parentNode/childNodes relationship).

When it is stated that some element or attribute is ignored, or treated as some other value, or handled as if it was something else, this refers only to the processing of the node after it is in the DOM. A user agent must not mutate the DOM in such situations.

The term text node refers to any Text node, including CDATASection nodes; specifically, any Node with node type TEXT_NODE (3) or CDATA_SECTION_NODE (4). [DOMCORE]

A content attribute is said to change value only if its new value is different than its previous value; setting an attribute to a value it already has does not change it.

This is clear from a DOM perspective, because what is needed for scripting is simply a reference to the whole DOMDocument and it works pretty well when served as text/html. But what happens if something like this:

<!DOCTYPE html>
<title>Test</title>

is served as application/xhtml+xml? It surely works from the DOM point of view, but it will returns an XML parsing error because there's no root element. Obviously you need to add it. So what we got so far? An incomplete, lazy markup for text/html, when you can omit the root element, and a strict syntax for XML media types, where obviously you cannot omit it without getting an error. I don't think that this choice is correct. In my opinion, the root element should be mandatory. Otherwise, we'll probably fall back into the dark days of confusing, illegible markup when every developer could make any possible thing with his/her markup without caring about web standards and validation.

If this choice is meant to be one of the future paces towards the end of XHTML 1.1, I'm surely against it. Surely application/xhtml+xml was not that success, but what's happened it's not due to the standard and the content type themselves, but to the lack of support in Internet Explorer, whose developers recently announced that IE 9 will support application/xhtml+xml but (that's incredible!) without checking whether the markup is well-formed or not...

I suggest keep using the root element in HTML 5 to demonstrate that we don't need laziness to write good markup and to prove that the inheritance of XHTML 1.1 is not dead at all, because I don't believe that the choice of a single browser can undermine the future of a standard.

Leave a Reply

Note: Only a member of this blog may post a comment.