HTML5 and XHTML: syntax comparison

In this article we will take a comparative look at HTML 5 and XHTML trying to figure out where these two markup languages are different and how we can get the maximum benefit from both languages.


W3C DOCTYPEs have never been the simplest things to be committed to memory. In fact, they are quite long and hard to remember. For example, the following is the DOCTYPE of XHTML 1.0 Strict:

<!DOCTYPE html PUBLIC "- // W3C // DTD XHTML 1.0 Strict // EN" "">

What developers have always been forced to do is to copy and paste the DOCTYPE from existing documents or rely on their editors. With HTML5 these difficulties disappear, because all you need is:

<! DOCTYPE html>

Confused? In fact, the reflection to do is quite simple: the length of the XHTML DOCTYPE comes from the fact that there is also included the URL where a program that processes an XHTML document should identify the DTD to validate our pages. But as we have seen, this feature is used only by the W3C validator, because browsers do not validate our pages, but simply try to interpret them as best as possible. For example, an error like this:

 <blockquote> ...</blockquote>

is detected by the validator, not by browsers that at least will present a page with an unusual formatting. A paragraph cannot contain a quotation block because it would be considered as invalid, but browsers do not care! Hence it was decided that in HTML5 the DOCTYPE must be oversimplified.

Element writing and syntax

HTML5 introduces new elements but it supports all of the elements already present in XHTML. The rules for XHTML elements are strict: empty elements must always have a closing tag, and certain elements must necessarily have some default attributes. For example, an image must always be inserted as follows:

<img src="foo.png" alt="Foo" />

Note that there is the string '/>' to close the tag. In HTML5, however, empty elements can be written with both the old notation of HTML 4 and XHTML.

The treatment of the attributes is changed. XHTML wants all the attributes always have a value enclosed either in single or double quotes, as follows:

<input type="text" name="q" id="q" disabled="disabled" />

As you can see, the disabled attribute has always its value, although in this case we're merely repeating its name. In HTML5 you can also write it just as in HTML 4:

<input type="text" name="q" id="q" disabled>

Note that HTML5 the closing tag (slash preceded by a space) is not necessary. Moreover, in HTML5 you can omit some element attributes, such as with script and style. So if we had earlier:

<style type="text/css"> </ style>
<script type="text/javascript" src="script.js"> </ script>

Now we can simply write:

<style> </ style>
<script src="script.js"> </ script>

The explanation for this choice is that browsers are already able to identify the content of some items without having to specify additional attributes. However, as noted above, in HTML5 you can use both notations.

A final example of this simplification is the use of meta tags for specifying the encoding of the document. In XHTML we have:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

HTML5 also accepts

<meta charset="utf-8"/>

Return of the embed element

The embed element is deprecated in XHTML in favor of object. Instead, in HTML5 this element is valid. The reason behind that is pretty simple: at the time of this writing, this element is still required to make a media file work correctly across browsers. For that reason, HTML5 developers decided to bring the embed element to life again in order to take advantage of its cross-browser capabilities, especially for those user-agents that still don't fully support object.

Leave a Reply

Note: Only a member of this blog may post a comment.