XML DTD tutorial

DTDs are a basic building block of any valid XML document. In fact, DTDs provide a simple reference grammar to validate our XML documents. A DTD tells to a validating XML parser what kind of content, attributes or elements can actually be contained within another element. In this post I'm going to discuss with you some basic aspects of the DTDs, explaining why they are so useful.

Linking a DTD from an XML document

We can link a DTD from an XML document in this way:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE book SYSTEM "book.dtd">

The attribute standalone set to no tells an user-agent to validate our document against the DTD declared below. This DTD is called book and is stored in the book.dtd file. Our XML document looks as follows:


<book>
  <pages>...</pages>
  <price cur="USD">
    <high>...</high>
    <regular>...</regular>
    <discount>...</discount>
  </price>
  <ship>...</ship>
  <store>...</store>
  <weight>...</weight>
</book>

Declaring relationships between elements

The relationships between elements are declared using the ELEMENT notation block:

<!ELEMENT book (pages*, price*, ship*, store+, weight?)>

The above code means: "the element book can contain only the pages, price, ship, store and weight elements". The characters near each element mean:

  1. +: 1 or more times
  2. *: 0 or more times
  3. ?: the element is optional

Same thing applies to the price element:

<!ELEMENT price (high?, regular, discount?)>

Declaring attribute values

Attribute values are declared using the ATTLIST notation block:

<!ATTLIST price cur (USD|CAD|AUD|EUR) "USD">

First comes the element that contains the attribute (price in this case), then the name of the attribute (here cur) followed by a series of possible values enclosed between brackets and separated by a logical OR (which means an alternative). The last part is made up by the default value of such attribute.

Declaring the content of an element

Usually an element that doesn't contain other elements but text content is said to contain the PCDATA data type:

<!ELEMENT pages (#PCDATA)>
<!ELEMENT high (#PCDATA)>
<!ELEMENT regular (#PCDATA)>
<!ELEMENT discount (#PCDATA)>
<!ELEMENT ship (#PCDATA)>
<!ELEMENT store (#PCDATA)>
<!ELEMENT weight (#PCDATA)>

This type stands for parsed character data and is normally used for simple text content.

Emulating namespaces in the DTD

DTDs don't provide a native support for XML namespaces, but we can emulate them using the CDATA type and the FIXED rule:

<!ATTLIST book xmlns CDATA #FIXED "http://site.com/ns/book">

This entry was posted in , by Gabriele Romanato. Bookmark the permalink.

Leave a Reply

Note: Only a member of this blog may post a comment.