Googlebot

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Googlebot
Google 2015 logo.svg
Original author(s)Google
TypeWeb crawler
WebsiteGooglebot FAQ

Googlebot is the web crawler software used by Google, which collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler (to simulate desktop users) and a mobile crawler (to simulate a mobile user).[1]

A website will probably be crawled by both Googlebot Desktop and Googlebot Mobile. The subtype of Googlebot can be identified by looking at the user agent string in the request. However, both crawler types obey the same product token (useent token) in robots.txt, and so a developer cannot selectively target either Googlebot mobile or Googlebot desktop using robots.txt.

If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[2] or by adding the meta tag <meta name="Googlebot" content="nofollow" /> to the web page.[3] Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".[4]

Currently, Googlebot follows HREF links and SRC links.[2] There is increasing evidence Googlebot can execute JavaScript and parse content generated by Ajax calls as well.[5][6] There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.[7][8][9] Currently, Googlebot uses a web rendering service (WRS) that is based on Chromium rendering engine (version 74 as on 7th May 2019)[10]. Googlebot discovers pages by harvesting all the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.

A problem that webmasters with low-bandwidth Web hosting plans[citation needed] have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.[citation needed] This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Search Console" that allow website owners to throttle the crawl rate.[11]

How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how often a website is updated. A site's crawl budget is determined by how many incoming links it has and how frequently the site is updated.[citation needed]

Technically, Googlebot's development team (Crawling and Indexing team) uses several defined terms internally to takes over what "crawl budget" stands for.[12]

Evergreen Googlebot[edit]

It was announced in May 2019 that Googlebot now runs the latest Chromium rendering engine. From now now on, Googlebot will regularly update its rendering engine to stay up to date.

New Features Supported[edit]

The new version of Googlebot supports 1000+ features compared to the previous version. The most important cited by the official Google documentation :

  • ES6 and newer JavaScript features
  • IntersectionObserver for lazy-loading
  • Web Components v1 APIs[13]

References[edit]

  1. ^ "Googlebot". Google. 2019-03-11. Retrieved 2019-03-11.
  2. ^ a b "Google Search Console". Google.com.
  3. ^ "Google Search Console". search.google.com. Retrieved 2019-03-11.
  4. ^ Exact Googlebot client info can be found in Google-cached copies of pages which display such data to visitors. For example, see [1]
  5. ^ "Googlebot makes POST requests via AJAX".
  6. ^ "Google, the Jig is Up! Googlebot is actually a browser..."
  7. ^ "Googlebot's Javascript Interpreter: A Diagnostic".
  8. ^ "Googlebot is Chrome".
  9. ^ "How Googlebot crawls JavaScript".
  10. ^ "The new evergreen Googlebot". Official Google Webmaster Central Blog. Retrieved 2019-06-07.
  11. ^ "Google - Webmasters". Google.com. Retrieved 2012-12-15.
  12. ^ "What Crawl Budget Means for Googlebot". Official Google Webmaster Central Blog. Retrieved 2018-07-04.
  13. ^ "The new evergreen Googlebot". Official Google Webmaster Central Blog. Retrieved 2019-06-17.

External links[edit]