In this post I'm going to show you how to walk the DOM of an HTML document with PHP. First of all, a caveat: at the moment of this writing, the PHP's DOM implementation doesn't recognize HTML5 documents. If you try to load such files, some core DOM methods such as getElementById() will return null. sad The best thing you can do is always providing a validated XHTML document. In fact, another source of errors are non valid documents. Let's see how to accomplish our task:
$document = new DomDocument();
$document->loadHTMLFile('php-dom.html');
$startElement = $document->getElementById('test');
$output;
if($startElement->hasChildNodes()) {
$index = 0;
do {
$index++;
$element = $startElement->childNodes->item($index);
if($element->nodeType == 1) {
$output .= '<p>' . $element->firstChild->nodeValue . ' ' . $element->nodeName . '</p>' . "\n";
} else {
$output .= '<p>Blank node.</p>' . "\n";
}
} while($index < $startElement->childNodes->length);
}
echo $output;
We use the childNodes HTML node list to retrieve all the nodes within a target element (in this case an element with ID test) using ado...while loop. You can see an example below.