PHP Google sitemap generator for static websites

A couple of months ago a client of mine asked me to create a sitemap for his website. I knew the fact that Google sitemaps are generally considered a good SEO practice in indexing websites, so I decided to create it using some Google tools. Unfortunately, I wasn't able to use the Python script provided by Google, and other online tools had some restrictions. Since this website was actually a static website with a few pages inside, I created a PHP class to create the sitemap. Here it is:

SiteMap.class.php

<?php
   class SiteMap {
    private $_re; 
    protected $_seen = array();
    
    
    /** @param String a PCRE */
    
    
    public function setPCRE($regexp) {
    
        $this->_re = $regexp;
 
    }
    
            
    /**@param String monthly, daily, weekly, etc. It's optional for a sitemap
              @return String The <changefreq/> element **/
    
    public function setChangeFrequency($freq) {
       if(isset($freq)) {
       
          if(is_string($freq)) {
    
            return '<changefreq>' . $freq . '</changefreq>' . "\n";
     
   } else {
       return '';
   }       
      
       }
       
       
       
       
    
    }
    
    /** @param String A number between 0.0 and 1.0. It's optional for a sitemap
               @return String The <priority/> element */
    
    public function setPriority($priority) {
    
         if(isset($priority)) {
 
      if(is_string($priority)) {
 
       return '<priority>' . $priority . '</priority>' . "\n";
      }
      else {
        return '';
      }
 }
   
   
   
 
    }
    
    
    /** @param String An <urlset/> element
              @return String The root element and its namespace */
        
    public function setNS($ns) {
    
    
        if(isset($ns)) {
       
          if(is_string($ns)) {
    
            return '<urlset ' . $ns . '>' . "\n";
     
   } else {
       return '<urlset>' . "\n";
   }       
      
       }
    
    
    }
    
    /** @return Bool Check if the directory separator is / or not */
    
    public function isWindows() {
       if(DIRECTORY_SEPARATOR == '\\') {
           return true;
        } else {
           return false;
        }
    } 
    
    
    
    
    
    /** @param String The name of a directory
              @return Array An array of documents that matches the PCRE defined in SiteMap::RE **/

    public function searchDir($dir) {
        $pages = array();
        $dirs = array();
        $this->_seen[realpath($dir)] = true;
        try {
            foreach (new RecursiveIteratorIterator(new
                RecursiveDirectoryIterator($dir)) as $file) {
                if ($file->isFile() && $file->isReadable() && (! isset($this->_seen[$file->getPathname()]))) {
                    $this->_seen[$file->getPathname()] = true;
      $doc_url = $file->getPathname();
      
      
      
                if (preg_match($this->_re, $doc_url)) {
                
  $uri = substr_replace($file->getPathname(), '', 0, strlen($_SERVER['DOCUMENT_ROOT']));
  if($this->isWindows()) {
        $uri = str_replace(DIRECTORY_SEPARATOR, '/', $uri);
  }
  $lastmod = strftime('%m-%d-%Y', filemtime($doc_url));
  
  
                if (preg_match($this->_re, $doc_url)) {
                    array_push($pages, array($uri, $lastmod));
                } else {
                    array_push($pages, array($uri,$uri));
                }
      }

        }
            }

        } catch (Exception $e) {
            // Problem
        }

    return $pages;
   }

}

Basically, this class uses PHP directory iterators to work with static HTML files. It extracts their URLs and the time of the last modification using regular expressions and return them as an array (through the searchDir() method). Here's a basic usage:

Use of the SiteMap class

<?php
header('Content-Type: text/xml');
require_once('SiteMap.class.php');


$sitemap = new SiteMap();
$sitemap->setPCRE('/\.html$/');
$changefreq = $sitemap->setChangeFrequency('daily');
$priority = $sitemap->setPriority('1.0');


$matching_pages = array();
$search_dirs = array('/');  // Insert '/' if you want to start from the root

foreach ($search_dirs as $dir) {
    if($sitemap->isWindows()) {
       $matching_pages = array_merge($matching_pages,$sitemap->searchDir($_SERVER['DOCUMENT_ROOT'] . $dir));
    } else {
        $matching_pages = array_merge($matching_pages,$sitemap->searchDir($_SERVER['DOCUMENT_ROOT'] . '/'. $dir));
    }
}


echo '<?xml version="1.0" encoding="UTF-8"?>' ."\n";     


if(count($matching_pages) > 0){

    echo $sitemap->setNS('xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"');
    
    foreach ($matching_pages as $k => $v) {
        if(preg_match('/index\.html|index\.php/', $v[0])) {
            $v[0] = preg_replace('/index\.html|index\.php/', '', $v[0]);
        }
 
 
        
        echo sprintf("<url>\n<loc>http://{$_SERVER['HTTP_HOST']}%s</loc>\n<lastmod>%s</lastmod>\n%s\n%s</url>\n", $v[0], $v[1], $changefreq, $priority);     
        
    }
    
    echo '</urlset>';
} else {
    echo '<error>No documents found.</error>';
    
} 
?>

The above script generates the sitemap. Notice, however, that this class is especially useful for small or medium websites. If you want to use it on bigger sites, you'd better to set a timeout limit for the script because of the memory limits of your PHP configuration (set in php.ini).

Leave a Reply

Note: Only a member of this blog may post a comment.