jQuery: parsing HTML pages

In this post I'm going to show you how to parse an HTML page using jQuery's AJAX methods. In order to accomplish our task, we need in this case a basic PHP proxy that will fetch the contents of a remote page. This proxy is shown below.

<?php
header('Content-Type: text/html');

$file = file_get_contents('http://dev.css-zibaldone.com/onwebdev/post/');
echo $file;



?>

As you can see, the page fetched is actually the list of all the examples posted on this blog. Before adding jQuery, we need a simple markup structure that will contain our elements:

<div id="page">
 
 <ol></ol>
 
</div>

Now we can add jQuery:

$(document).ready(function() {
 
 var content = '';
 
 $.ajax({
  
  url: 'page.php',
  type: 'GET',
  dataType: 'html',
  data: null,
  success: function(html) {
   
   
   $(html).find('a').each(function() {
    
    
    var $a = $(this);
    var href = $a.attr('href');
    var text = $a.text();
    
    content += '<li><a href="http://dev.css-zibaldone.com/onwebdev/post/' + href + '">' + text + '</a></li>';
    
    
   });
   
   $('#page ol').html(content);
   
   
  }
  
  
 });
 
 
 
 
});

The key thing that needs to be specified in the ajax() method is the content type of the resource (in this case HTML). By doing so, jQuery will treat the content returned by the AJAX request as a DOM object, thus allowing for normal DOM traversing operations. You can see this demo here.

This entry was posted in by Gabriele Romanato. Bookmark the permalink.

Leave a Reply

Note: Only a member of this blog may post a comment.