Talk:Sitemaps

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Google (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Google, a collaborative effort to improve the coverage of Google and related topics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 

A previously unlabeled conversation[edit]

merlinvicki, took the signature off, needed on Talk pages but not article pages

Jwestbrook 22:51, 19 October 2005 (UTC)

past link to article on Merlinvicki

Jwestbrook 23:34, 24 October 2005 (UTC)

oops

J\/\/estbrook       18:16, 6 November 2005 (UTC)

Addition of links tab[edit]

Does anyone know the exact date that the links tab became active in Google sitemaps? Siralexf 17:15, 9 February 2007 (UTC)siralexf

Multiple links provided with Google result[edit]

I've noticed that in the last few months, some site listed in google results include sublinks. For example, this search for slashdot returns a link to the main slashdot site along with links to Games - Login - Apple - Science beneath the description. Is this one of the benefits of submitting a site map to google? If so, it would be worth mentioning in the article. mennonot 09:39, 19 November 2005 (UTC)

Not related, Matt Cutt explained it was Google search results improvement: [1] Ivan Bajlo 15:09, 27 December 2005 (UTC)

Wiki Sitemaps?![edit]

Does Mediawiki has an extension that creats a Google Sitemaps auytomaticaly? Or is it is built with integrated sitemap? F16

Yes there is a script to make sitemaps in your wiki's /maintenance directory. Jidanni (talk) 04:25, 15 March 2008 (UTC)

Sitemap generation tools[edit]

I'm removing a bunch of links to sites that claim to generate sitemaps... by spidering a web site. Can someone please explain how this is any different from allowing the search engines spider your website? Seems pretty pointless and shady to me. --Imroy 20:53, 31 August 2006 (UTC)

==

True, but some tools (not exactly the formerly listed ones) do provide some added value, for example editing attributes like change-frequency or priority. Crawling the site is just a way to create the initial URL list then. However, I don't think that listing every sitemaps tool is a good idea, providing links to lists of tools like at code.googgle.com or sitemapstools.com is enough. That said, I do think that linking to a sitemap validator is a good thing. I provide such a free tool (along with tons of sitemaps info, FAQs, a Vanessa Fox interview ...) on my site at smart-it-consulting.com and somebody linked to it a cpl. months ago. Unfortunately, this link is gone too. --Sebastian September/21/2006

==

Why is ROR listed? All major search engines support RSS, but none (!) of them states support for the added ROR fields. If you don't understand what I mean, check this article: http://www.micro-sys.dk/developer/articles/website-sitemap-kinds-comparison.php You can see Google and Yahoo mentions a lot of formats, but none of them is ROR.

--Tom November/10/2007

==

Spidering a website is the only reliable way to create a sitemap particularly for larger, dynamic websites. When search engines crawl your site, they do not produce a sitemap for you. The entire point of "Google Sitemaps" as well as Yahoo's sitemap program is that webmasters are asked to submit a sitemap. The search engines want sitemaps which is why this page exists here. Besides this, a sitemap service can share their findings with the webmaster... which the search engines do not do very well, if at all. Not all pages on the web are coded very well and despite the myriad of articles which explain how to write good code, for many it's easier to get a list of coding and HTTP protocol errors that are specific to their website (pages, server responses, HTTP status errors, etc.). What is shady about it? --MaxPowers 08:23, 25 January 2007 (UTC)

==

Particularly for larger, dynamic websites the sitemaps should be generated from the underlying database. If dynamic sites use spidering tools to create the sitemap, most probably especially the URLs not visible to SE crawlers will not get included. Makes sense? --Sebastian February/06/2007

Removed another link. Please se WP:EL for more information, specifically, what is not accepted:
Links mainly intended to promote a website. —The preceding unsigned comment was added by Mporcheron (talkcontribs) 23:03, 25 January 2007 (UTC).

==

Spidering allows a realistic view of any size website and can be used to uncover errors on the page due to template or 'cms error'. One typical example is on WordPress blogs where commenters do not leave a website address and the link is listed as href="http://" when the page is displayed to browsers and SE spiders. This is technically a broken link and is one example of how a spidering service can benefit a webmaster by sharing their findings. Which URLs would not normally be visible, but need to be included in a sitemap? It would seem to reason that if a site wants the SE's to see a page, it should be visible and should have at least some pages linking to it if that page is to do anything within any search engine. All SE's will filter out orphaned pages including Google.

The sitemap programs (not software 'programs') offered by the search engines allow webmasters to share URLs that are not generally spidered, such as multi-level navigation through categories and sub-categories, but if 'normal' navigation is broken to the point that spidering is "impossible", then it is generally a poor navigational structure to begin with. Some spidering services offer a means to get around this anyway using scripted images, but this is probably irrelevant for this discussion.

The biggest problem with db-based systems is that they are very specific to a particular application and do not cover other areas of comprehensive websites (forum, blog, cart, general CMS, static pages, etc. all on one site). I would agree that db-based sitemap generators could be more efficient as they don't require a full page to load, but that efficiency comes at the price of sacrificing completeness in many cases and accuracy from a spiders point of view in all cases.MaxPowers 05:39, 8 February 2007 (UTC)

Robots.txt "Sitemap:" declaration.[edit]

The text both at sitemaps.org and here says:

"The <sitemap_location> should be the complete URL to the Sitemap, ..."

Note that "should" is not "must," and as other directives (namely, "Disallow:") use relative URLs, not absolute ones, the language used in the definition of the declaration implies that a relative URL for a site map declaration (only in the "/robots.txt" file) is valid and may be used. If the intent of the definition were to require only fully specified URLs, the language used to specify the declaration syntax needs to be changed. I have noted that some people think that only a fully specified URL can be used in "robots.txt" for a site map declaration; such a conclusion appears erroneous based on the diction used.

I assume that verbs such as "should, must and may" have their usual meanings as in the popular Internet "request for comments" document series.

- D. Stussy, Los Angeles, CA, USA - 08:30, 31 May 2007 (UTC)

Of course you can put it in there that way. It won't break robots.txt. However: I want sitemap-aware bots to figure out where my sitemap is, so I'll give them what they're expecting: a full URL. 198.49.180.40 19:17, 8 June 2007 (UTC)

There's no reason to believe that a robot can't compute an absolute URL from a relative URI, as it must do so with other relative URIs from other HTML resources it fetches along its indexing (or scanning) journey. In fact, somewhere I have a patch for HTDIG (3.1.6) to do exactly that - accept a relative URI for a sitemap from "/robots.txt". (It adds the sitemap to the stack and uses an external XML processor add-on to process the map or sitemapindex - as it may be either.) - D. Stussy, 01:13, 9 July 2010 (UTC)

Additional: Defining the sitemap as a relative URI is useful especially in virtual hosting where multiple web sites under different domains/hostnames are served by the same physical host, where they may share a globally defined "/robots.txt" file. One entry (e.g. "Sitemap: /sitemap.xml") could point to a site map for every domain on that host. Of course, the individual site maps would be present in each domain's document root and would have different content (or not exist). Such a construct provides for separation of domains and avoids having to make multiple entries in one file that could tell malicious people what other domains are served via the same host while allowing the host adminstrator to set a uniform robots policy. - D. Stussy, 17 June 2011.

Submit site map URL[edit]

www.google.com/webmasters/tools/ping?sitemap= or http://google.com/webmasters/sitemaps/ping?sitemap= ?

See http://www.google.com/support/webmasters/bin/answer.py?answer=34609 —Preceding unsigned comment added by 87.119.120.23 (talk) 13:10, 22 February 2008 (UTC)

Submission Sitemap Externals[edit]

I forgot to log in when I added those external links, they contain the "official" method by that said search engine to submit a valid XML sitemap. Neither of the pages attempt to sell a product and keep the valid neutral point of view. Please comment here on any change suggestions.

This can help clean up the article with all the how-to information and allow an external neutral point of view apply any official how to methods for that said search engine supporting the sitemaps feature.

SDSandecki (talk) 06:21, 25 February 2008 (UTC)

Plain text OK too[edit]

Mention that sitemaps can also be in plain text format: sitemap.txt, and sitemap.txt.gz. See the Google webmaster tips if you don't believe me. Jidanni (talk) 04:27, 15 March 2008 (UTC)

Sitemap and sitemaps[edit]

The two article apparently are speaking of the same thing. But there are some confusion about what sitemap is. Quote of the sitemaps.org (the protocol site):
"Sitemap protocol format consists of XML tags"
For me, sitemap = sitemaps. Maybe we shoud disambiguate "sitemap" as architecture and as protocol. Acaciz (talk) 14:29, 24 April 2009 (UTC)

Can we remove the "how-to" warning?[edit]

I read the page today and did not see any how-to content. I think it is time to remove the {howto|article} warning template. I am going to put a message on Ddxc's talk page (s/he was the originator of the how-to template's use). The template has been there since 23 December 2007.

BrotherE (talk) 19:11, 7 September 2009 (UTC)

I agree. Just read the article and was informed, not trained.

To be honest, ** there is ** how to, the article duplicates the tutorial on sitemaps.org. It this part is removed as it must be, very few content will remain. I still think that it must be merged with Sitemap. Macaldo (talk) 11:44, 17 November 2009 (UTC)


Sitemap term has Two meanings[edit]

XML Sitemaps used to direct Search engine parsing, and Site Maps that help in user navigation/ architecture

Neither of the pages attempt to sell a product and keep the valid neutral point of view. Please comment here on any change suggestions.

This can help clean up the article with all the how-to information and allow an external neutral point of view apply any official how to methods for that said search engine supporting the sitemaps feature. —Preceding unsigned comment added by DShantz (talkcontribs) 00:50, 30 March 2010 (UTC)

Sitemap location statement is wrong and spec is unclear[edit]

"As the Sitemap needs to be in the same directory as the URLs listed". This is wrong. It needs to be at the level of or above the URLs listed. http://www.sitemaps.org/protocol.html#location gives more detail and also explains how robots.txt can grant permission for foo.com's sitemap to be hosted on bar.com but it doesn't make clear how that affects the path part of the URL nor whether this permission passes down through sitemapindexes to other sitemaps. -- Ralph Corderoy (talk) 18:37, 3 January 2013 (UTC)