Parsing a Google Sitemap with jQuery

A Google Sitemap has the following form:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

<url>
  <loc>http://www.css-zibaldone.com/</loc>
  <priority>1.00</priority>
  <lastmod>2008-08-10T06:03:54+00:00</lastmod>
  <changefreq>monthly</changefreq>
</url>

<!-- more url elements -->

</urlset>

Here we're interested in the loc and lastmod elements which return the URI of the documents and the date of their last modification, respectively. First, let's set up some basic styles:

body {
    margin: 0 auto;
    width: 60%;
    padding: 2em 0;
    background: #fff;
    color: #333;
    font: 76% Arial, sans-serif;
}

a:link, a:visited {color: #080;}

h1 {
    font: normal 1.6em "Trebuchet MS", Trebuchet, sans-serif;
    color: #666;
    margin: 5px 0;
    padding-bottom: 3px;
    border-bottom: 3px solid #999;
    text-transform: uppercase;
}

#sitemap {
    margin: 1em 0;
    padding: 5px;
    border: 2px solid #c4df9b;
    background: #edf5e1;
    list-style: none;
    font-size: 1.1em;
}

#sitemap li {
    margin-bottom: 6px;
}

#sitemap li div.lastmod {
    font-family: Verdana, sans-serif;
    font-size: small;
    height: 100%;
    margin-bottom: 4px;
    padding: 3px 0;
    border-top: 1px dashed #666;
    border-bottom: 1px dashed #666;
    font-style: italic;
}

Then we add jQuery. To accomplish our task, we first need an helper function to parse the dates provided in the sitemap. It is as follows:

function formatDate(timestamp) {

   var dateParts = timestamp.split('-');
   
   var year = dateParts[0];
   var rawMonth = dateParts[1];
   var month;
   var day = dateParts[2];
   
   switch(rawMonth) {
   
       case '01':
         month = 'January';
         break;
       case '02':
         month = 'February';
         break;
       case '03':
         month = 'March';
         break;
       case '04':
         month = 'April';
         break;
       case '05':
         month = 'May';
         break;
       case '06':
         month = 'June';
         break;
        case '07':
         month = 'July';
         break;
       case '08':
         month = 'August';
         break;
       case '09':
         month = 'September';
         break;
        case '10':
         month = 'October';
         break;
       case '11':
         month = 'November';
         break;
       case '12':
         month = 'December';
         break;
       default:
         break;

   
   
   }
  
   return month + ' ' + day + ',' + ' ' + year;  
}

This function parses a date in the format yyyy-mm-dd by splitting it into three different parts which will be later returned in the format 'm d, y'. Now it's time to use jQuery to parse our sitemap effectively:

$(document).ready(function() {

    $('<ul id="sitemap"></ul>').insertAfter('h1');

    $.get('sitemap.xml', function(xml) {
    
        $(xml).find('loc').each(function() {
 
 var rawUrl = $(this).text();
 var rawLastMod = $(this).parent().find('lastmod').text();
 var $url;
 var $lastMod;
 var timestamp;
 var $li = $('<li></li>');
 
 if(rawUrl.indexOf('index.html') != -1) {
 
      $url = rawUrl.replace('index.html', '');
 
 } else {
 
      $url = rawUrl;
 
 }
 
 timestamp = rawLastMod.replace(/T.+/g, '');
 $lastMod = formatDate(timestamp);
 
 $li.html('<a href="' + $url + '">' + $url + '</a>' + '<div class="lastmod">Last Modified: ' + $lastMod + '</div>');
 
 $li.appendTo('#sitemap');
 
 
 
 
        });
    
    
    });


});

We use the $.get() method to fetch and parse our sitemap file. During parsing, we make sure that URLs passed in don't contain the string 'index.html'. If so, we remove it. We also make sure that the last modification date passed to the formatDate() function is in the format 'yyyy-mm-dd'. To do this, we remove the last part of the lastmod text node beginning with a 'T'. You can see the final result here.

Leave a Reply

Note: Only a member of this blog may post a comment.