Tuesday 9 June 2009

Auto submit updated sitemap.xml to google webmasters with php code

Theory

To resubmit your Sitemap using an HTTP request:

Issue your request to the following URL:
www.google.com/webmasters/tools/ping?sitemap=sitemap_url
For example, if your Sitemap is located at http://www.example.com/sitemap.gz, your URL will become:
www.google.com/webmasters/tools/ping?sitemap=http://www.example.com/sitemap.gz
URL encode everything after the /ping?sitemap=:

www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.yoursite.com%2Fsitemap.gz
Issue the HTTP request using wget, curl, or another mechanism of your choosing.

A successful request will return an HTTP 200 response code; if you receive a different response, you should resubmit your request. The HTTP 200 response code only indicates that Google has received your Sitemap, not that the Sitemap itself or the URLs contained in it were valid. To obtain status information about your Sitemap, resubmit it using Webmaster Tools account. We recommend that you resubmit a Sitemap no more than once per hour. An easy way to do this is to set up an automated job to generate and submit Sitemaps on a regular basis.

Code

I got the code from here
http://nadeausoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl#curlinit

The quick code

function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true,     // return web page
CURLOPT_HEADER         => false,    // don't return headers
CURLOPT_FOLLOWLOCATION => true,     // follow redirects
CURLOPT_ENCODING       => "",       // handle all encodings
CURLOPT_USERAGENT      => "spider", // who am i
CURLOPT_AUTOREFERER    => true,     // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
CURLOPT_TIMEOUT        => 120,      // timeout on response
CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
);
$ch      = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err     = curl_errno( $ch );
$errmsg  = curl_error( $ch );
$header  = curl_getinfo( $ch );
curl_close( $ch );
$header['errno']   = $err;
$header['errmsg']  = $errmsg;
$header['content'] = $content;
return $header;
}

$result = get_web_page('www.google.com/webmasters/tools/ping?sitemap='.urldecode('http://site.com/sitemap.xml'));
if ( $result['errno'] != 0 )
echo "error , bad url";
if ( $result['http_code'] != 200 )
echo "error , no servivice";
$page = $result['content'];
echo $page;
?>
output

"Sitemap Notification Received

Your Sitemap has been successfully added to our list of Sitemaps to crawl. If this is the first time you are notifying Google about this Sitemap, please add it via http://www.google.com/webmasters/tools/ so you can track its status. Please note that we do not add all submitted URLs to our index, and we cannot make any predictions or guarantees about when or if they will appear."
 

After this, put a cron tab to the link , Note that please dont go less than a hour request to google. otherwise they will be angry.