In this post I'm going to show you how to walk the DOM of an HTML document with PHP. First of all, a caveat: at the moment of this writing, the PHP's DOM implementation doesn't recognize HTML5 documents. If you try to load such files, some core DOM methods such as getElementById()
will return null
. sad The best thing you can do is always providing a validated XHTML document. In fact, another source of errors are non valid documents. Let's see how to accomplish our task:
$document = new DomDocument(); $document->loadHTMLFile('php-dom.html'); $startElement = $document->getElementById('test'); $output; if($startElement->hasChildNodes()) { $index = 0; do { $index++; $element = $startElement->childNodes->item($index); if($element->nodeType == 1) { $output .= '<p>' . $element->firstChild->nodeValue . ' ' . $element->nodeName . '</p>' . "\n"; } else { $output .= '<p>Blank node.</p>' . "\n"; } } while($index < $startElement->childNodes->length); } echo $output;
We use the childNodes
HTML node list to retrieve all the nodes within a target element (in this case an element with ID test
) using ado...while
loop. You can see an example below.