Semantics of XML elements

Although XML is a structured markup language without any predefined DTD, writing semantical XML element names can actually improve the final structure of our document and make our code more readable and easier to maintain. If you're accustomed to the semantics of (X)HTML element names, you've probably noticed that every single element name reflects its own underlying purpose. For example, an em element marks up a normal emphasis, a p element marks up a paragrah and so on.

As said above, in XML we have to write our custom DTD, so the choice of proper element names is crucial. Consider the following example:

<?xml version="1.0" encoding="utf-8"?>
<books>
  <oreilly>
    <book>
      <title>XML. Pocket Reference</title>
      <authors>
        <author>Simon St. Laurent</author>
        <author>Michael Fitzgerald</author>
      </authors>
      <isbn>978-0-596-10050-6</isbn>
      <pages>171</pages>
      <price>9.95</price>
    </book>
<!-- omitted for brevity -->
  </oreilly>
<!-- omitted for brevity -->
</books>

In this example, which resembles the inner structure of database tables, element names are meaningful and descriptive. By doing so, a developer who will probably read your code in the future can easily figure out the structure of your document and grasp the meaning of element names on the fly. However, things start to be more complicated when dealing with mixed content (i.e., elements which contain both text and other elements). In this case, we should avoid the practice of creating a huge amount of elements names, because this practice can create problems with our DTD. Instead, we should rely on custom attributes to add semantics to our elements. For example, if I want to create a level-one heading and a paragraph that contains some text with a normal emphasis, I can write the following:

<?xml version="1.0" encoding="utf-8"?>
<page>
  <heading level="1">Title</heading>
  <para>This is a <emph type="normal">normal</emph> emphasis.</para>
</page>

As you can see, instead of writing heading1, heading2 and so on, we can reduce the amount of code by using custom attributes for our purposes. Notice how the semantics of elements has been preserved.

Leave a Reply

Note: Only a member of this blog may post a comment.