Wikipedia:Advanced disambiguation issues

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This essay describes advanced issues in Wikipedia's approach to disambiguation of the overloaded or uncertain terms used when writing or finding articles.

Initial focus on identical spelling[edit]

Many Wikipedia editors began pushing the initial focus, about disambiguation, as a problem of separating only titles which would have identical spelling, rather than similar spellings. The guidelines within the document MOS:DAB began limiting not only the form, but also, the substance of what a disambiguation page would be allowed to list. The bias has been a severe focus on identical spellings, rather than handling similar spellings or pronunciations, etc.

Expanding to disambiguate similar words/images[edit]

In reality, people have many other concerns about ambiguous, or vague, concepts (not just identical titles):

  • disambiguation should handle similar spellings (not just identical)
  • disambiguation should handle similar pronunciations (not just spelling)
  • disambiguation should handle similar words (not just identical)
  • disambiguation should handle extra words (not just the same)
  • disambiguation should handle similar concepts (not just words)
  • disambiguation should handle similar images (not just text).

Because of the rigidly enforced focus that Wikipedia's use of disambiguation is strictly about identical words, I have used the broader concept of "crosslink pages" that can connect to articles that have similar pronunciations, similar words, similar phrases, extra words beyond those specified, and similar images about those words. So then, crosslink pages are another type of fork-page, used in Wikipedia to help selecting related articles.

Target-linking beyond disambiguation pages[edit]

The 2nd most common use of a title should also be linked in the 1st article's hat-note, not just "See zzz (disambiguation)". Many people have been linking to the same-name titles by using a top hat-note linked to only one recommended page, so for article "zzz", the hat-note gives a link to only "zzz (disambiguation)". However, that is often a case of mass-linking to a disambiguation page, whereas target-linking to some specific page would often be quicker for the interested reader. Using the 80/20 rule, perhaps 80% of interest in a particular phrase or title is answered in 20% of articles sharing that title. For example, for the word "blanket", then over 80% of people want either a blanket bed-cover or Blanket Jackson (3rd child of Michael Jackson). Less than 20% of people are seeking some other meaning (of the word "blanket"). So, to avoid a mass-linking, instead, use target-linking of the most likely alternative titles:

In the weeks following the death of the singer, the pageviews for word "blanket" jumped very high and settled around 850 pageviews per day, which was 4x times higher than the prior average (in 2009) of 210 requests per day, asking for word "blanket". Obviously, the interest in Blanket Jackson then exceeded 75% of all interest in word "blanket". As custody hearings and inheritance issues were predicted to last for years, there would be no quick return to the day when Blanket was mainly a bed-cover. Thus a target-link to "Blanket Jackson" allows over 75% of all readers to quickly access the article they most wanted, the person not the bed-cover.

In practice, the use of target-linking only requires editing of a small number of articles, because only the top 20% of articles will be read enough to bother changing their hat-note links. This a second occurrence of the 80-20 rule:

  • 20% of all same-title articles answer "80%" of reader interest;
  • 20% of all same-title articles can be target-linked to quicken "80%" of reader requests.

Thus target-linking can make Wikipedia (roughly) 5x faster to respond to user requests, and reduce total pageviews by a similar amount of about 80% less page-view traffic on Wikipedia for same-titles (which are becoming very common).

As Wikipedia expands to cover a vast range of topics, many more thousands of titles will require disambiguation, because so many new articles will be re-using those same titles. Many films use the same title as the book (which could be written after the film), and many books use a catchy common phrase as the book title, such as the books Merger Mania or Star Quality. Hence, optimizing pages to use target-linking will begin reducing Wikipedia page-view traffic by about 80% for many subjects.

Precedents outside of Wikipedia[edit]

Online dictionaries, for years, have been listing alternative choices when a user enters misspelled words. Rather than focus on only identical words, those dictionaries treat similar spellings as a need for disambiguation, rather than presume that a misspelled word is the same as some other correctly spelled word. Thus the word "Tomatoe" might generate a choice of similar words: tomatoes, tomatose, tometone, or Tomateo, etc.

See also[edit]