XML is a universal format for data exchange on the web. In its essence, XML is a markup language like HTML, but with a stricter syntax. In this post I'll outline the main features of XML and its differences with HTML.
XML has no predefined element set. Unlike HTML, XML has no predefined DTD. Element names are defined every time by us. The only limitations are:
- you cannot use
xmlas an element name
- element names cannot start with a digit.
The XML prolog must always be put at the very beginning of an XML document, just before the root element. It defines three aspects of an XML document:
- XML version (
- document encoding (
- whether the document itself must be validated against a given DTD (
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <document></document>
Every XML document must have a root element. This element contains the document tree and its DOM structure.
<?xml version="1.0" encoding="utf-8"?> <document> <section id="section-a"> <title level="1">Title</title> <para>...</para> </section> </document>
In this case,
document is the root element.
Encoding and content type
The preferred encoding for XML is UTF-8. You can even choose to use UTF-16, if you want. The delivered content type must be either
application/xml. The former content type is backward-compatible for user-agents that don't fully support XML.
- XML is case-sensitive. This rule applies to element names, attributes and attribute values.
- Every XML document must have an XML prolog.
- Every XML document must have a root element (and only one root element).
- Elements must be correctly nested. Thus
<element> <para> </element> </para>will return a fatal XML parsing error.
- Attribute values must be enclosed within quotes. Thus
<element attr=value></elemento>will return a fatal XML parsing error.
- Empty elements must have a matching closing tag as other elements. Thus
<break>will return a fatal XML parsing error. You must write empty elements as
- All special characters must be converted into SGML entities, such as >, < and so on. If there is no nominal entity reference for your character, use the hexadecimal notation, that is
&#x, followed by the Unicode value, followed by a semi-colon. For Unicode values, see Alan Wood's site.
XML style sheets
To associate a CSS style sheet to an XML document, you must use a particular processing instruction to be inserted just after the XML prolog:
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet href="style.css" type="text/css"?>
This PI works much as a normal HTML
Class and ID attributes in XML
Since XML has no predefined DTD which associates attributes to elements, class and ID attributes don't work as in HTML. If you try to use some DOM methods such as
getElementsByClassName(), you will get empty or
null results. In the same way, if you try to use CSS ID and class selectors, you won't match any element.
getElementsByTagName(). For CSS, you can use attribute selectors.