The Sitemaps protocol Allows a webmaster to inform search engines about URLs we That website are available for crawling. A Sitemap is an XML file that lists the URLs for a site. It provides additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This site is more intelligently powered. Sitemaps are a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

Sitemaps are particularly beneficial on websites where:

  • Some areas of the website are not available through the browsable interface
  • webmasters use rich Ajax , Silverlight , or Flash content that is not normally processed by search engines.
  • The site is very wide and there is a chance for the web crawlers
  • When websites have a huge number of pages
  • When a website has few external links

History

Google first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites. Google, MSN, and Yahoo announced support for the Sitemaps protocol in November 2006. The schema was changed to “Sitemap 0.90”, but no other changes were made.

In April 2007, Ask.com and IBM announced support for Sitemaps. Also, Google, Yahoo, MS announced self-discovery for sitemaps through robots.txt. In May 2007, the state governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites.

The Sitemaps protocol is based on ideas [1] from “Crawler-friendly Web Servers,” [2] with improvements including self-discovery robots.txtand the ability to specify the priority and change frequency of pages.

File format

The Sitemap Protocol format consists of XML tags. The file itself must be UTF-8 encoded. Sitemaps can also be a plain text list of URLs. They can also be compressed in .gz format.

A sample Sitemap that contains just one URL and uses all tags below.

<? xml version = "1.0" encoding = "utf-8"?>
<urlset xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9"
 xmlns: xsi = "http: //www.w3 .org / 2001 / XMLSchema-instance "
 xsi: schemaLocation = " http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd " >
 <url>
 <loc> http://example.com/ </ loc>
 <lastmod> 2006-11-18 </ lastmod>
 <changefreq> daily </ changefreq>
 <priority> 0.8 </ priority>
 </ url>
</ urlset>

The Sitemap XML protocol is also extended to provide a multiple listing of Sitemaps in a ‘Sitemap index’ file. The maximum size of 50 Sitemap MiB gold 50,000 URLs [3] means clustering this is Necessary for large sites.

An example of Sitemap index referencing one separate sitemap follows.

<? xml version = "1.0" encoding = "UTF-8"?>
<sitemapindex xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9" >
 <sitemap>
 <loc> http: // www .example.com / sitemap1.xml.gz </ loc>
 <lastmod> 2014-10-01T18: 23: 17 + 00: 00 </ lastmod>
 </ sitemap>
</ sitemapindex>

Element definitions

The definitions for the elements are shown below: [3]

Element Required? Description
<urlset> Yes The document-level element for the Sitemap. The rest of the document after the ‘<? Xml version>’ element must be contained in this.
<url> Yes Parent element for each entry.
<sitemapindex> Yes The document-level element for the Sitemap index. The rest of the document after the ‘<? Xml version>’ element must be contained in this.
<sitemap> Yes Parent element for each entry in the index.
<loc> Yes Provides the full URL of the page or sitemap, including the protocol (eg http, https) and trailing slash, if required by the site’s hosting server. This value must be shorter than 2,048 characters.Note that ampersands in the URL need to be escaped as &amp;.
<lastmod> No. The date was last modified, in ISO 8601 format. This can display the full date and time or, if desired, may be the date in the YYYY-MM-DD format.
<changefreq> No. How often can the page change:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

“Always” is used to denote documents that change when they are accessed. “Never” is used to denote archived URLs (ie files that will not be changed again).

This guide is used as a guide for crawlers , and is frequently used.

Does not apply to <sitemap>elements.

<priority> No. The priority of that URL to other URLs on the site. This allows you to view which pages are considered more important.The valid range is from 0.0 to 1.0, with 1.0 being the most important. The default value is 0.5.

Rating all pages on a site with a high priority does not affect search listings, as it is only used to suggest to the crawlers how important pages are in the site are to one another.

Does not apply to <sitemap>elements.

Support for the elements that is not required. [3]

Other formats

Text file

The Sitemaps protocol allows the Sitemap to be a simple list of URLs in a text file. The file specifications of XML Sitemaps apply to text Sitemaps as well; the file must be UTF-8 encoded, and can not be more than 10 MB large or contain more than 50,000 URLs, [4] but can be compressed as a gzip file. [3]

Syndication feed

A syndication feed is a possible method of submitting URLs to crawlers; this is called for syndication feeds. One stated drawback is this method of providing URLs, but other URLs can still be found during normal crawling. [3]

It can be used to syndicate feed as a delta update (containing only the newest content) to supplement a complete sitemap.

Search engine submission

If Sitemaps are submitted directly to a search engine ( pinged ), it will return status information and any processing errors. The details involved with submission will vary with the different search engines. The location of the sitemap can be included in robots.txtthe following line:

Sitemap: <sitemap_location>

The <sitemap_location>should be complete URL to the sitemap, such as:

http://www.example.org/sitemap.xml

This directive is independent of the user-agent line, so it does not matter where it is placed in the file. If the website has multiple sitemaps, multiple “sitemap:” records may be included robots.txt, or the URL may simply point to the main sitemap index file.

The following table lists the sitemap submission URLs for several major search engines:

Search engine Submission URL Help page Market
Baidu http://zhanzhang.baidu.com/dashboard/index Baidu Webmaster Dashboard China, Hong Kong, Singapore
Bing (and Yahoo! ) http://www.bing.com/webmaster/ping.aspx?siteMap= Bing Webmaster Tools Global
Google http://www.google.com/webmasters/tools/ping?sitemap= Submitting a Sitemap Global
Yandex http://webmaster.yandex.com/site/map.xml Sitemaps files Russia, Ukraine, Belarus, Kazakhstan, Turkey

Sitemap URLs submitted using the sitemap submission URLs need to be url-encoded , for example: replacing :(colon) with %3A/(slash) with %2F[3]

Limitations for search engine indexing

Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results. Specific examples are provided below.

  • Google – Webmaster Support on Sitemaps: “Using a sitemap does not guarantee that all the items in your sitemap will be crawled and indexed, but Google processes rely on complex algorithms to schedule crawling. having a sitemap, and you’ll never be penalized for having one. ” [5]
  • Bing – Bing uses the standard sitemaps.org and is very similar to the one mentioned below.
  • Yahoo – After the search deal Inc. and Microsoft, Yahoo! Site Explorer has merged with Bing Webmaster Tools

Sitemap limits

Sitemap files have a limit of 50,000 URLs and 50 MiB per sitemap. Sitemaps can be compressed using gzip , reducing bandwidth consumption. Multiple sitemap files are supported, with a Sitemap index file serving as an entry point. Sitemap index files may not list more than 50,000 Sitemaps and must be larger than 50 MiB (52,428,800 bytes) and can be compressed. You can have more than one Sitemap index file. [3]

As with all XML files, any data values ​​(including URLs), ampersand (&), single quote (‘), double quote (“), less than (<), and greater than (>) .

Multilingual and multinational Sitemaps

In December 2011, Google announced the annotations for sites that want to target users in many languages ​​and, optionally, countries. A few months later Google announced, on their official blog, [6] that they are adding support for specifying the “alternate” and hreflang annotations in Sitemaps. Instead of the (until then only option) HTML link elements the Sitemaps option offered many advantages which included a smaller page size and easier deployment for some websites.

One example of the Multilingual Sitemap would be followed

English language users through http://www.example.com/en and Greek language users through http://www.example.com/gr , up until then the only option was to add the hreflang annotation in the HTTP header or as HTML elements on both URLs like this

 <link rel = "alternate" hreflang = "en" href = "http://www.example.com/en" >
 <link rel = "alternate" hreflang = "gr" href = "http: //www.example .com / gr " >

But now, one can use the following equivalent markup in Sitemaps:

1 <url>
2 <loc> http://www.example.com/en </ loc>
3 <xhtml: link 4 rel = "alternate" 5 hreflang = "gr" 6 href = "http: // www. example.com/gr " /> 7 <xhtml: link 8 rel = " alternate " 9 hreflang = " en " 10 href = " http://www.example.com/en " /> 11 </ url> 12 < url> 13 <loc> http://www.example.com/gr </ loc> 14 <xhtml: link 15 rel =
 "alternate"
 16 hreflang = "gr" 17 href = "http://www.example.com/gr" /> 18 <xhtml: link 19 rel = "alternate" 20 hreflang = "en" 21 href = "http: //www.example.com/en " /> 22 </ url>

See also

  • Biositemap
  • metadata
  • Resources of a Resource
  • Yahoo! Site Explorer
  • Google Webmaster Tools

References

  1. Jump up^ ML Nelson; JA Smith; del Campo; H. Van de Sompel; X. Liu (2006). “Efficient, Automated Web Resource Harvesting” (PDF) . WIDM’06 .
  2. Jump up^ Brandman O., Cho J., Hector Garcia-Molina , and Narayanan Shivakumar (2000). “Crawler-friendly web servers”. Proceedings of ACM SIGMETRICS Performance Evaluation Review, Volume 28, Issue 2 . doi : 10.1145 / 362883.362894 .
  3. ^ Jump up to:g “Sitemaps XML format” . Sitemaps.org. 2016-11-21 . Retrieved 2016-12-01 .
  4. Jump up^ https://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668
  5. Jump up^ “About Google Sitemaps” . Google.com. 2016-12-01 . Retrieved 2016-12-01 .
  6. Jump up^ “Multilingual and multinational website annotations in Sitemaps” . Google Webmaster Central Blog . Pierre Far. May 24, 2012.