XML is a universal format for data exchange on the web. In its essence, XML is a markup language like HTML, but with a stricter syntax. In this post I'll outline the main features of XML and its differences with HTML.
Element names
XML has no predefined element set. Unlike HTML, XML has no predefined DTD. Element names are defined every time by us. The only limitations are:
- you cannot use
xml
as an element name - element names cannot start with a digit.
XML prolog
The XML prolog must always be put at the very beginning of an XML document, just before the root element. It defines three aspects of an XML document:
- XML version (
version
attribute) - document encoding (
encoding
atribute) - whether the document itself must be validated against a given DTD (
standalone
attribute -yes
orno
values).
Example:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <document></document>
Root element
Every XML document must have a root element. This element contains the document tree and its DOM structure.
<?xml version="1.0" encoding="utf-8"?> <document> <section id="section-a"> <title level="1">Title</title> <para>...</para> </section> </document>
In this case, document
is the root element.
Encoding and content type
The preferred encoding for XML is UTF-8. You can even choose to use UTF-16, if you want. The delivered content type must be either text/xml
or application/xml
. The former content type is backward-compatible for user-agents that don't fully support XML.
Syntax rules
- XML is case-sensitive. This rule applies to element names, attributes and attribute values.
- Every XML document must have an XML prolog.
- Every XML document must have a root element (and only one root element).
- Elements must be correctly nested. Thus
<element> <para> </element> </para>
will return a fatal XML parsing error. - Attribute values must be enclosed within quotes. Thus
<element attr=value></elemento>
will return a fatal XML parsing error. - Empty elements must have a matching closing tag as other elements. Thus
<break>
will return a fatal XML parsing error. You must write empty elements as<element />
or<element></element>
. - All special characters must be converted into SGML entities, such as >, < and so on. If there is no nominal entity reference for your character, use the hexadecimal notation, that is
&#x
, followed by the Unicode value, followed by a semi-colon. For Unicode values, see Alan Wood's site.
XML style sheets
To associate a CSS style sheet to an XML document, you must use a particular processing instruction to be inserted just after the XML prolog:
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet href="style.css" type="text/css"?>
This PI works much as a normal HTML link
element.
Class and ID attributes in XML
Since XML has no predefined DTD which associates attributes to elements, class and ID attributes don't work as in HTML. If you try to use some DOM methods such as getElementById()
or getElementsByClassName()
, you will get empty or null
results. In the same way, if you try to use CSS ID and class selectors, you won't match any element.
For JavaScript and the DOM, you can choose to use more generic methods, such as getElementsByTagName()
. For CSS, you can use attribute selectors.