XML structure of a FeedBurner feed

A FeedBurner feed is usually located at the address you've chosen as your FeedBurner account name. For example, my FeedBurner feed is located at http://feeds.feedburner.com/blogspot/onwebdev. If you use a download command line utility (such as wget), you can download it very easily. Basically, this kind of feed is automatically generated by a server-side script, so you can stumble on some problems if you try to fetch this resource from your website. In my case, I've tried to use PHP's DOM extension and SimpleXML to parse my feed, but it didn't work, because this kind of approach requires the presence of a static XML file. Nevertheless, if you use a more common stream-based approach (like file_get_contents()), it works fine, though.

In any case, you need to know in advance the structure of a FeedBurner feed if you want to succeed. In this post, we'll look into the details of this kind of feed.

Root element and namespaces

<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

The root element is rss with five namespaces attached to it. The namespaces are:

  1. atom: http://www.w3.org/2005/Atom
  2. openSearch: http://a9.com/-/spec/opensearch/1.1/
  3. georss: http://www.georss.org/georss
  4. thr: http://purl.org/syndication/thread/1.0
  5. feedburner: http://rssnamespace.org/feedburner/ext/1.0

Many of the elements that we'll find within our feed belong to one or more of these namespaces, so it must be clear from start that we have to take these namespaces into account while parsing our feed. Either we choose a server-side approach or a client-side one, if we don't know very well the namespace structure of our feed, it's more likely that we'll encounter some problems.

The channel element

The channel element contains both item elements and some additional information that we may use to add some description to our feed. Basically, the elements directly contained within the channel element that are relevant for our purpose are:

  1. title: the title of the whole feed
  2. description: a brief description of our feed
  3. link: the URI of our blog or website
  4. lastBuildDate: a date of the latest updates to our feed
  5. managingEditor: the author of the feed.

But the most important element of the channel element is surely item. It's discussed below.

The item element

Each item element contains the relevant information about a post of our blog or website. The most important children of this element are:

  1. pubDate: the date when our post has been published
  2. category: the category under which our post has been published (one or more elements)
  3. title: the title of our post
  4. description: the content of our post; XHTML tags are inserted by encoding the < and > entities, such as &lt; and &gt;
  5. feedburner:origLink: the original link of your post
  6. author: the author of the post

We've only one problem here: the feedburner:origLink belongs to our feedburner global namespace, so we have to take this aspect into account during parsing. Basically, you have to select an element that "lives" inside the feedburner global namespace. The most obvious solution to this problem is using a DOM method such as getElementsByTagNameNS(). or, if you use SimpleXML or any other string-based extension, an approach like this:

$link_ns = $item->children('http://rssnamespace.org/feedburner/ext/1.0');
$link = $link_ns->origLink;

For more info on this solution, read this article.

2 thoughts on “XML structure of a FeedBurner feed”

  1. Thank you for your explain
    PHP code for DOMDocument

    foreach ($doc->getElementsByTagName('item') as $node) {
    $link = $node->getElementsByTagNameNS('http://rssnamespace.org/feedburner/ext/1.0', 'origLink')->item(0)->nodeValue ;
    }

Leave a Reply

Note: Only a member of this blog may post a comment.