Jump to content

Wikipedia:Search Engine NOCACHE by default proposal

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 128.61.56.41 (talk) at 10:39, 18 February 2009 (→‎Discussion is good, and here is what I want to discuss: Reformatted previous comment and added response). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Wikipedia currently as of mid-February 2009 allows all search engines to "cache" its results. That is, if a search engine like Google happens to crawl a page, any inappropriate or "bad" content, including WP:BLP violations, may be propagated out onto the Internet for an indeterminate amount of time. However, we have the ability to set Wikipedia to be "NOCACHE" in our robots.txt file. The major benefit of this is that search engines would only report the "current" state of an article (or any page) at any given time.

At least once, a slightly prominent BLP article was vandalized with racial epithets, that the world's search engines then cached.[1] A vandal replaced the entire BLP article with three epithets.[2] However, the damage was done, and according to Wikipedia on search engines, we were now referring to the BLP subject as "NIGGA".[3] The edit was reversed less than two minutes later, but the damage was done.[4]

That was one of the single most-watched BLP articles we've ever had--what chance do the hundreds of thousands of lesser-known BLP articles have? The idea behind this proposal would be to protect not just BLPs, but the integrity of our articles themselves from being cached with bad information, even temporarily.

See also

References

  1. ^ Oswald, Ed (2009-02-17). "Google Search for Barack Obama Reveals Racial Epithets". Technologizer. Retrieved 2009-02-18. {{cite news}}: Cite has empty unknown parameter: |coauthors= (help)
  2. ^ 04:44, February 17, 2009 edit to Barack Obama.
  3. ^ The edit in question, which was cached by Google. It was done at 04:44, February 17, 2009. It was reversed at 04:46, February 17, 2009, <2 minutes later, but the damage was done and saved for the world to see, on one of the most well-known living people on Earth, for an unknown length of time.
  4. ^ 04:46, February 17, 2009 edit to Barack Obama.

Discussion is good, and here is what I want to discuss

  • I do not think NOCACHE should be used at all. Search engines are possibly the most common ways that people locate Wikipedia content, and this would be made more difficult if search engines were prevented them from caching Wikipedia content. Wikipedia cannot be blamed for vandalism that occurs to articles, including those about living persons, and if the vandalism is picked up by search engines. The risk of vandalism is inherent in the nature of Wikipedia, and I believe most people who access the website realize this. — Cheers, JackLee talk 08:36, 18 February 2009 (UTC)[reply]
  • I'm not sure about this. Ideally we need to implement a versioning system, which would prevent the typical anon vandalism from being in the stable version, and only the stable version would ever go to a search engine. That would make this proposal irrelevant. But, if the choice is this (NOCACHE) or nothing, I'ld support this. --Rob (talk) 08:16, 18 February 2009 (UTC)[reply]
  • In a perfect world, this plus flagged revs on BLPs would nuke just about anything new from getting in. Just a note too, the user that made the edit? Registered and editing since 2006. rootology (C)(T) 08:18, 18 February 2009 (UTC)[reply]
  • In that regard, doesn't that essentially negate Rootology's suggestion of flagged revisions? As his revision would not have been flagged, it would have gone on anyway -- just another example of why flagged revisions falls short of even the lowest of expectations, in my biased opinion. 128.61.56.41 (talk) 10:39, 18 February 2009 (UTC)[reply]
  • Just a question: We use Google cache to review deletions, especially for those that aren't admins. I know there are other resources out there (Deletionpedia, Wikibin, ...) but my experience of them is that they are rather sketchy, especially with regard to recent deletions. How are we going to replace it? MER-C 08:47, 18 February 2009 (UTC)[reply]
  • What will the effects of this be on Wikipedia's appearance in the search results? If Google isn't allowed to cache Wikipedia articles, it won't be able to present the two-line excerpts in the results, making search results less useful and less attractive. --Carnildo (talk) 08:38, 18 February 2009 (UTC)[reply]
  • Per User:Carnildo's question above, is it really practical to ask google not to cache Wikipedia? Wikipedia is a significant part of all of google's traffic, and google is one of Wikipedia's primary search methods / referring sites. Further, google seems to be mirroring some google stuff and there are lots of mirror sites. And beyond that, aren't there other caches at work? Wikipedia has a cache, everyone's browser has a cache, perhaps there is ISP level caching, Akamai type stuff, etc. If the problem is random vandalism maybe we just have to live with it. If the problem is sophisticated vandals gaming the caches, maybe the answer is to get sites like google to improve their caching and cache flushing system for rapidly changing content... making the whole site's BLP articles uncacheable to deal with occasional vandals may be throwing the baby out with the bath. I'm not arguing either way, just wondering if it's a technical problem with a more precise technical solution. Wikidemon (talk) 09:04, 18 February 2009 (UTC)[reply]
  • Per Wikidemon, the result of this proposal may well be to diminish Wikipedia to save 2 minutes of misfortune. I wonder, though, if it might be possible to only display revisions which have been around for 3 days without a revert. That is, instead of having such results instantly change, have only the page with non-controversial edits. Just a thought. 128.61.56.41 (talk) 10:29, 18 February 2009 (UTC) I suppose I just re-iterated Rob's suggestion above. Whoops. 128.61.56.41 (talk) 10:39, 18 February 2009 (UTC)[reply]