Jump to content

Sitemaps

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Sade (talk | contribs) at 18:46, 2 November 2008 (Added Category:XML-based standards.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a URL inclusion protocol and complement robots.txt, a URL exclusion protocol.

Sitemaps are particularly beneficial on websites

  • where some areas of the website are not available through the browsable interface, or
  • where webmasters use rich Ajax or Flash content that is not normally processed by search engines.

The webmaster can generate a Sitemap containing all accessible URLs on the site and submit it to search engines. Since Google, MSN, Yahoo, and Ask use the same protocol now, having a Sitemap would let the biggest search engines have the updated pages information.

Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. By submitting Sitemaps to a search engine, a webmaster is only helping that engine's crawlers to do a better job of crawling their site(s). Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results.[citation needed]

History of Sitemaps

  • Google first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites.

The Sitemaps protocol is based on ideas[1] from "Crawler-friendly Web Servers".[2]

XML Sitemap Format

The Sitemap Protocol format consists of XML tags. The file itself must be UTF-8 encoded. (Sitemaps can also be just a plain text list of URLs. They can also be compressed in .gz format.)

Sample

A sample Sitemap that contains just one URL and uses all optional tags is shown below.

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
			    http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
	<url>
		<loc>http://w3c-at.de</loc>
		<lastmod>2006-11-18</lastmod>
		<changefreq>daily</changefreq>
		<priority>0.8</priority>
	</url>
</urlset>

Search Engine Submission

If Sitemaps are submitted directly to a search engine (pinged), it will return status information and any processing errors. The details involved with submission will vary with the different search engines. The location of the Sitemap can also be included in the robots.txt file by adding the following line to robots.txt:

Sitemap: <sitemap_location>

The <sitemap_location> should be the complete URL to the Sitemap, such as: http://www.example.org/sitemap.xml. This directive is independent of the user-agent line, so it doesn't matter where it is placed in the file. If the website has several Sitemaps, this url can simply point to the main Sitemap index file.

The following table lists the Sitemap submission URLs for several major search engines:

Search engine Submission URL Help page
Google http://www.google.com/webmasters/tools/ping?sitemap= How do I resubmit my Sitemap once it has changed?
Yahoo! http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=SitemapWriter&url=
http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=
Does Yahoo! support Sitemaps?
Ask.com http://submissions.ask.com/ping?sitemap= Q: Does Ask.com support sitemaps?
Live Search http://webmaster.live.com/ping.aspx?siteMap= Webmaster Tools (beta)
Yandex Sitemaps files

Sitemap limits

Sitemap files have a limit of 50,000 URLs and 10 megabytes per sitemap. Sitemaps can be compressed using gzip, reducing bandwidth consumption. Multiple sitemap files are supported, with a Sitemap index file serving as an entry point for a total of 1000 sitemaps.

As with all XML files, any data values (including URLs) must use entity escape codes for the characters : ampersand(&), single quote ('), double quote ("), less than (<) and greater than (>).

Notes

  1. ^ M.L. Nelson, J.A. Smith, del Campo, H. Van de Sompel, X. Liu (2006). "Efficient, Automated Web Resource Harvesting" (PDF). WIDM'06. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)
  2. ^ O. Brandman, J. Cho, Hector Garcia-Molina, and Narayanan Shivakumar (2000). "Crawler-friendly web servers". Proceedings of ACM SIGMETRICS Performance Evaluation Review, Volume 28, Issue 2. doi:10.1145/362883.362894. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)CS1 maint: multiple names: authors list (link)

See also