jQuery: RSS reader

Fetching a remote RSS feed with jQuery is one of the most frequently asked questions in a million. jQuery can't handle a remote feed by itself due to the same-domain policies of the AJAX standard. For that reason, we need a server-side script which accepts two parameters, namely the absolute URL of the feed and the number of items you want to display. In our example we'll use PHP, obviously after making sure that the passed URL is a valid URL and that we're actually dealing with an RSS feed. Here's the script:

Formatting a Wordpress post date in the RSS format

Wordpress automatically formats the post_date field in the wp_posts table using various date formats, including the RSS and Atom ones. The default data type for this field is datetime, which means that the date stored in this field has the format YYYY-MM-DD HH:MM:SS. However, sometimes we may need to format this data type directly, for example if something goes wrong with our RSS or Atom feeds and we actually have to manually create a physical feed. Here's how we can do:

CSS: styling an RSS feed reader

Styling an RSS feed reader is easy with CSS. In this post I'm going to use the main RSS feed of the jQuery's blog to show you some CSS techniques that you can reuse in your own project. The feed will be fetched with jQuery using a local copy of it, but you can always use a server-side language to retrieve its contents. We'll see how to accomplish this as well. First, our basic markup structure:

jQuery: RSS feed rotator

Let's say that we want to create an RSS feed rotator with jQuery. To accomplish this, we need a server-side script to fetch the feed, jQuery's AJAX methods and a JavaScript timer to create the intervals between feeds. Since fetching a feed requires some time, we hide the elements while the process is running and then we reveal them one by one with a certain delay. First, let's take a look at our PHP script:

XML structure of a FeedBurner feed

A FeedBurner feed is usually located at the address you've chosen as your FeedBurner account name. For example, my FeedBurner feed is located at http://feeds.feedburner.com/blogspot/onwebdev. If you use a download command line utility (such as wget), you can download it very easily. Basically, this kind of feed is automatically generated by a server-side script, so you can stumble on some problems if you try to fetch this resource from your website. In my case, I've tried to use PHP's DOM extension and SimpleXML to parse my feed, but it didn't work, because this kind of approach requires the presence of a static XML file. Nevertheless, if you use a more common stream-based approach (like file_get_contents()), it works fine, though.

In any case, you need to know in advance the structure of a FeedBurner feed if you want to succeed. In this post, we'll look into the details of this kind of feed.

Root element and namespaces

<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

The root element is rss with five namespaces attached to it. The namespaces are:

  1. atom: http://www.w3.org/2005/Atom
  2. openSearch: http://a9.com/-/spec/opensearch/1.1/
  3. georss: http://www.georss.org/georss
  4. thr: http://purl.org/syndication/thread/1.0
  5. feedburner: http://rssnamespace.org/feedburner/ext/1.0

Many of the elements that we'll find within our feed belong to one or more of these namespaces, so it must be clear from start that we have to take these namespaces into account while parsing our feed. Either we choose a server-side approach or a client-side one, if we don't know very well the namespace structure of our feed, it's more likely that we'll encounter some problems.

The channel element

The channel element contains both item elements and some additional information that we may use to add some description to our feed. Basically, the elements directly contained within the channel element that are relevant for our purpose are:

  1. title: the title of the whole feed
  2. description: a brief description of our feed
  3. link: the URI of our blog or website
  4. lastBuildDate: a date of the latest updates to our feed
  5. managingEditor: the author of the feed.

But the most important element of the channel element is surely item. It's discussed below.

The item element

Each item element contains the relevant information about a post of our blog or website. The most important children of this element are:

  1. pubDate: the date when our post has been published
  2. category: the category under which our post has been published (one or more elements)
  3. title: the title of our post
  4. description: the content of our post; XHTML tags are inserted by encoding the < and > entities, such as &lt; and &gt;
  5. feedburner:origLink: the original link of your post
  6. author: the author of the post

We've only one problem here: the feedburner:origLink belongs to our feedburner global namespace, so we have to take this aspect into account during parsing. Basically, you have to select an element that "lives" inside the feedburner global namespace. The most obvious solution to this problem is using a DOM method such as getElementsByTagNameNS(). or, if you use SimpleXML or any other string-based extension, an approach like this:

$link_ns = $item->children('http://rssnamespace.org/feedburner/ext/1.0');
$link = $link_ns->origLink;

For more info on this solution, read this article.

Parsing RSS feeds with the DOM and JavaScript

Parsing RSS feeds with the traditional DOM approach is not the simplest way to perform this task. However, if you want to get a finer control over the whole process, this may be a feasible way. Here's the basic code to achieve this goal:

function XMLDoc() {
 var me = this;
 var req = null;
 if (window.XMLHttpRequest) {
  req = new XMLHttpRequest();
 }
 
 else if (window.ActiveXObject) {
  try {
   req = new ActiveXObject("MSXML2.XMLHttp.6.0");
  }
  catch(e) {
   try {
    req = new ActiveXObject("MSXML2.XMLHttp.3.0");
   }
  catch(e) {
   req = null;
  }
  
  }
 }
 
 
 this.request = req;
 this.loadXMLDoc = function (url, handler) {
  if (this.request) {
   this.request.open ("GET", url, true);
   this.request.onreadystatechange = function () {
    handler(me);
   };
   this.request.setRequestHeader("Content-Type", "text/xml");
   this.request.send(null);
  }
 };
}

function initXML () {
 var newrequest = new XMLDoc();
 newrequest.loadXMLDoc("rss.xml", getRSS);
}

function getRSS (req) {
 req = req.request;
 var content = document.getElementById("content");
 var div = document.createElement("div");
 div.className = "entries";
 var h3 = document.createElement("h3");
 h3.innerHTML = "Recent entries";
 div.appendChild(h3);
 var ol = document.createElement("ol");
 div.appendChild(ol);
 
 
 if (req.readyState == 4 && req.status == 200) {
  var root = req.responseXML.documentElement;
  var items = root.getElementsByTagName("item");
  
  
  for (var i=0, len=items.length; i<len; i++) {
  
  
   var title = items[i].getElementsByTagName("title")[0].firstChild.nodeValue;
   var link = items[i].getElementsByTagName("link")[0].firstChild.nodeValue;
   var li = document.createElement("li");
   li.innerHTML = "<a href='" + link + "'>" + title + "</a>";
   
   
   
   ol.appendChild(li);
   
   
   
  
  
  
  }
  
  content.appendChild(div);
  
  
  
  
 }
}


window.onload = initXML;

The first object, XMLDoc, creates and returns an XHMHttpRequest object. It also opens the given resource and sets the content type for the Ajax request (in this case, it uses the HTTP verb GET) through the loadXMLDoc() function, which also accepts a function reference to handle the request via the onreadystatechange event. The function initXML() uses an instance of XMLDoc to set the event handler to the function getRSS().

The getRSS() function simply retrieves a DOMDocument instance from the XMLDoc object, loops through all item elements by starting at the root of the XML document (using responseXML.documentElement as reference) and then extracts the values of the link and title elements to create XHTML links.

Although this approach actually allows you to see all the details of an Ajax connection, it's better to use the Ajax functionality of some JavaScript library (such as jQuery or Prototype) in order to simplify the code and avoid redundance. Futher, if you want to use this DOM approach, you should cache your data and use local variable reference instead of global reference. For example, instead of writing:

var content = document.getElementById("content");
var div = document.createElement("div");

you should write:

var doc = document;
var content = doc.getElementById("content");
var div = doc.createElement("div");

By doing so, you avoid global lookups and improve the performance of your script.

XSLT and RSS: test results

I started my tests by stylizing a static RSS file with CSS after transforming it with XSLT and linking to the page through the link element. Browsers apply in this case their own custom template to the RSS file and display it accordingly, so there's no way to circumvent browser default formatting for an RSS document (static in this case). I need more time in order to use a server-side XSLT processing for displaying an RSS document generated on the fly. Time will tell.

XSLT and RSS: planning tests

As many of you know, browsers that support RSS show your feeds using their own formatting template. That's cool, but what happens if we want to transform an RSS feed with XSLT? We could write something like this:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="style.xsl" type="text/xsl"?>

What happens then? That's what I'm going to test. Stay tuned!

The image element in RSS

Speaking theoretically, the image element in RSS should be used to insert graphics into an RSS feed. Just theoretically! In fact, according to some tests made during the writing of an RSS guide for Html.it, this element works only in Firefox. What's more, Firefox accepts it only when it's put in a a direct descendant of the channel element. That is, it doesn't work in an item element. Developers are then forced to use the string &lt;img /&gt; to insert an image into their RSS feed. Oh well...