A FeedBurner feed is usually located at the address you've chosen as your FeedBurner account name. For example, my FeedBurner feed is located at http://feeds.feedburner.com/blogspot/onwebdev. If you use a download command line utility (such as
wget), you can download it very easily. Basically, this kind of feed is automatically generated by a server-side script, so you can stumble on some problems if you try to fetch this resource from your website. In my case, I've tried to use PHP's DOM extension and SimpleXML to parse my feed, but it didn't work, because this kind of approach requires the presence of a static XML file. Nevertheless, if you use a more common stream-based approach (like
file_get_contents()), it works fine, though.
In any case, you need to know in advance the structure of a FeedBurner feed if you want to succeed. In this post, we'll look into the details of this kind of feed.
Root element and namespaces
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
The root element is
rss with five namespaces attached to it. The namespaces are:
- atom: http://www.w3.org/2005/Atom
- openSearch: http://a9.com/-/spec/opensearch/1.1/
- georss: http://www.georss.org/georss
- thr: http://purl.org/syndication/thread/1.0
- feedburner: http://rssnamespace.org/feedburner/ext/1.0
Many of the elements that we'll find within our feed belong to one or more of these namespaces, so it must be clear from start that we have to take these namespaces into account while parsing our feed. Either we choose a server-side approach or a client-side one, if we don't know very well the namespace structure of our feed, it's more likely that we'll encounter some problems.
The channel element
channel element contains both
item elements and some additional information that we may use to add some description to our feed. Basically, the elements directly contained within the
channel element that are relevant for our purpose are:
title: the title of the whole feed
description: a brief description of our feed
link: the URI of our blog or website
lastBuildDate: a date of the latest updates to our feed
managingEditor: the author of the feed.
But the most important element of the
channel element is surely
item. It's discussed below.
The item element
item element contains the relevant information about a post of our blog or website. The most important children of this element are:
pubDate: the date when our post has been published
category: the category under which our post has been published (one or more elements)
title: the title of our post
description: the content of our post; XHTML tags are inserted by encoding the < and > entities, such as < and >
feedburner:origLink: the original link of your post
author: the author of the post
We've only one problem here: the
feedburner:origLink belongs to our
feedburner global namespace, so we have to take this aspect into account during parsing. Basically, you have to select an element that "lives" inside the
feedburner global namespace. The most obvious solution to this problem is using a DOM method such as
getElementsByTagNameNS(). or, if you use SimpleXML or any other string-based extension, an approach like this:
$link_ns = $item->children('http://rssnamespace.org/feedburner/ext/1.0'); $link = $link_ns->origLink;
For more info on this solution, read this article.