Wikipedia:Google searches and numbers
|This proposal has become dormant through lack of discussion by the community.|
Google searches and numbers can be used to help identify a subject to ascertain WP:notability.
One of the biggest fallacies in determining the notability of a subject, which is part of determining whether a topic should have its own Wikipedia article, is the view that the results of a Google search alone can be used to assess notability. A Google search using the title or keywords of an article or subject has become known as a "Google test". It may be easy to view a subject as being notable solely because a Google search produces a huge number of hits, not notable because the search produces very few hits, or a hoax because it produces none at all. While such searches are indeed a very useful starting point, they do not in themselves determine notability or the lack thereof.
An obscure 1700s philosophical theory that is referenced in a number of widely respected older paper books may not show up on a Google search. But no Google hits does not mean that this theory is non-notable or a hoax. In fact, this theory may be notable under Wikipedia's rules, as it is described in multiple reliable sources. On the other hand, a reality TV contestant's name may generate a thousand Google hits–fan chat pages and blog posts regarding his or her sex life–but none of these may be reliable sources.
When performing a plain web search, it is possible that a lot of hits will turn up. Most probably, the majority of these will not count as reliable sources. Google News, Google Books, and Google Scholar provide results that are more likely to be reliable sources, but only if these hits are able to be verified and are reliable sources by reading the articles or books. While all of them may not be able to be viewed on the Google site itself, and many of them are previews, the search can at least show that the sources exist.
Pretty much everyone that uses a computer or cell phone uses search engines or even Metasearch engines at some point. There are many of them like Bing, Yahoo search, and the most popular Google search with an estimated 5.4 billion searches every day. Google uses algorithms to adjust search engine results pages (SERP) based on individual preferences. Unless the personal search criterion is turned off the results of each individual search will not produce raw results but specific results according to that users preferences.
Google search engines
Aside from those Google search platforms listed above (Google News, Google Books, and Google Scholar) there is Google Trends, Google Maps Pack (Google's Local 3-Pack business listings), and Google Arts & Culture project.
Why Google results can be misleading?
There are various reasons why using just the numbers of a Google search may be misleading ("there were 204,00 search results") concerning the establishment of notability. Raw search result numbers are often inflated to include many variables that can create large hit results.
While Wikipedia strives to present knowledge to the world free of charge Google does not follow philanthropic business principles but relies on advertising. The 2014 estimated Google database size of 10 exabytes (one exabyte= one billion Gigabytes) is likely now far surpassed with a 2019 estimated size of around 61.5 billion pages. It was estimated in 2013 that there was "2.5 quintillion bytes of data created every day which has likely been far surpassed.
Almost all Google search results follow one main theme which is the advertising factor. Google advertising resulted in $110.8 billion in revenue in 2017. This was due to various services such as Adwords (proprietary advertising service as an auction system) that is a part of almost all of Google’s web properties, the AdSense program, Ad Manager and Google Ad Manager 360. Many businesses depend wholly on advertising for their income. With Google Ads a business bids on choice words (keywords) to have their business placed higher in the search results ranking. The two main types of "Google Search Features" are content type and enhancements. A main factor in business ranking is Search Engine Optimization (SEO) and most businesses with a web presence use SEO to some degree. If personalization is not by-passed the results are highly personalized to the individual thus giving erroneous search results as for as Wikipedia is concerned.
A way to minimize large hits or "personalized search results" is to add "&pws=0" to the end of a search query. This will "turn off" personalized search results such as personal search history, habits, present geographical location, and other personalized factors. There are other URL modifiers that can be used as well.
Google searches are not references
It has become a practice in deletion discussions to quote a Google search or Google News search and say "look at all the results, there's your references" or "Two thousand Google hits, must be notable!" However, Google provides everything that can be found online, a huge majority of which are by no means reliable sources, and Google News reprints large swathes of material which may or may not be reliable, may or may not be relevant to the subject of the article, and may or may not still be there by the time the AfD closes (note that a full citation of a news article found online, with the author, title, newspaper name, etc. is still valid even if the website is discontinued. However, a bare url that no longer works may render an online source useless).
So therefore, if sources are found using Google, related to a topic under discussion for deletion, great! But cite the exact reference or source you've found, rather than making a vague wave at the Google search numbers and saying that this large number proves the article's subject is notable, verifiable, and worth climbing the Reichstag over. The converse is also true: do not argue in AfDs that "Zero Google hits, must be non-notable."
Wikipedia is not a dictionary
Wikipedia is not a dictionary. A dictionary focuses on words or phrases, exactly as they are titled, and generally without deviating from that title. Wikipedia as an encyclopedia, whose purpose is to tell about a person, group, place, object, event, or concept. Any of these may be known by one or more titles or groups of words, and any such title may have more than one meaning. While every Wikipedia article has a title, it is not the title that defines the subject, but the information contained within.
Search engines like Google focus on words or phrases, like the title of an article that one would likely enter into one. For example, if one wanted information on oil painting, s/he may enter the two words "oil painting" into a search engine (in quotes). This will likely produce plenty of web sites bearing the words "oil painting" in succession. As this is such a well known concept, it is likely many of these hits will tell about oil painting. But the query may also produce a site that contains the words "She was eating a salad topped with olive oil, painting a picture of a tree, and listening to music." This sentence has the words "oil painting" in succession, and therefore, would turn up in such a google query. But it has nothing to do with oil painting.
If you were to enter the phrase "was running laps" into a search engine, you would get a number of hits that contain these words in that exact succession. The sentence fragment may appear on a site that reads something like "He was running laps at the local track." But this does not mean there should be an article titled Was running laps.
A google search of the common word if produces several billion hits. On Wikipedia, the title If does not define the word if. Rather, it leads to a disambiguation page displaying a long list of subjects, including many songs, that happened to be titled "if" or with the initials IF. Still, the meaning of the common word if is restricted to a dictionary entry, and can only be written about on Wiktionary.
Many terms have multiple meanings
Many words, phrases, and other combinations of words have more than one meaning. For example, the term "4:30" to most people can refer to the time on the clock or to biblical verses. But writing an article on either of these examples using this exact title is not suitable. The title 4:30 is the name of a film. Not all GHits of 4:30 will produce sites pertaining to the film. Nevertheless, 4:30 is solely used on Wikipedia for the film.
The term Astro Boy has many uses. It is mostly known as a TV series, but there is also a disambiguation page listing other uses for this title. If a Google search of the term is performed, it is unclear how many results pertain to which meaning.
Not all websites are reliable sources
A Google search may produce hundreds, thousands, even millions of hits bearing the exact title of the article or other pages on the subject derived from key words. But only sites qualifying as reliable sources can be used to render a subject notable and to verify the accuracy of information. Most others do not qualify as permissible external links, let alone references.
Many, and often most websites fail to do just that. There are many websites aimed at selling a product or service. Wikipedia is not an advertising space, and such sites linked from an article would violate Wikipedia's advertising policy. Others include blogs, self-published sources, clones of Wikipedia, and other non-neutral or verifiable sources of information.
The best way to find actual reliable sources is not by a plain Google search, but with Google News, Books, and Scholar. Even so, this does not mean that any number renders notability or that all sources found in the search are reliable either for that article or for any article. Still, sources meeting the criteria are easier to find this way.
Not all sources provide in-depth coverage
Even if you do find one or more sources considered "reliable" by some standard, it does not automatically mean that they are good enough to support a particular subject. For example, if you wanted to write an article on a street, you may find plenty of news articles that trivially mention that street, and these articles may very well be useful in rendering other subjects notable. Sure, googling will bring them up. They may even help establish notability for another subject. But with their trivial mentions, they do not bring notability to the street.
There is nothing wrong with pointing to a list of "hits" when showing reasons why an article deserves to be saved in a deletion discussion. This is actually a good idea if looking for others to help save an article but Google search results alone are not grounds for protecting an article from deletion.
Three best sources
A better scenario than simply listing Google search hits would be to find the three best sources, that are reliable, providing independent and in-depth coverage, and produce these or add them to the article.
Listing Google search results
After reading this, one may think that listing the results of a Google search in a deletion debate is a bad thing. That is not true at all. Listing them may actually be helpful in saving an article from deletion. It may be advisable to make the assertion that the search was non-personalized or a raw search. While the Google results will not usually make or break the case, they may be helpful toward others in making necessary improvements to save an article from deletion, or merely to agree what should be done.
The editor who provides the listing of Google results may not be able to make the necessary improvements him/herself. Doing so is not required. But others who see these results may be able to take care of this, or even mention that these more specific sources do exist, even if they do not add the sources themselves (see WP:HASREFS).
- Reasons Why Your Google Search Results Are Different- Retrieved 2019-12-08
- size of the World Wide Web (The Internet) (Google)
- 2.5 quintillion bytes of data created every day. How does CPG & Retail manage it?
- How Google Makes Money (GOOG)- Retrieved 2019-12-09
- Disable Personalized Google Search for SEO- The “&pws=0” Parameter-Retrieved 2019-12-08
- Turning off Personalized Search – Get Raw Search Engine Results-Retrieved 2019-12-08
- Google Search Cheat Sheet-Retrieved 2019-12-08