Wikipedia:Overlink crisis

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This essay addresses the overlink crisis and problems of overlinking Wikipedia pages with excessive wikilinks, especially in navboxes or infoboxes. For instructions on reducing or de-linking navboxes, see below: Convert large navboxes to navpage-link.

The issue of overlinking in Wikipedia pages (or other hyperlinked text) is the characteristic of having too many internal wikilinks or hyperlinks to external webpages.[1]


Aspects of overlinking[edit]

There are some typical cases of overlinking.[1][2] It is characterized by:

  • a large proportion of the words in each sentence being rendered as links (like this one);
  • using links that have little related content, such as linking on specific years like 1995, or unnecessary linking of common words used in the common way, for which the reader can be expected to understand the word's full meaning in context, without any hyperlink help;
  • a link for any single term (other than for date formats) is excessively repeated in the same article. "Excessive" is usually more than one link for the same term in the same paragraph, since in this case one or more duplicate links will almost certainly then appear needlessly on the viewer's screen.

Overlink crisis[edit]

During 2007–2009 (and 2010-2011),[citation needed] many thousands of articles were modified to use various navigation boxes (navboxes) or infobox templates to link to related sets of other articles. However, the usage expanded:

  • the links in navboxes were gradually increased to link 100, 200, 500, 1000, 1500 (or more) articles, in each of thousands of other articles; and
  • multiple navboxes were placed on almost any article remotely related to the subject.

For example, by June 2009, the article "Morocco" had gained 12 separate navboxes, stacked at the bottom of the page, adding over 800 more wikilinks and doubling the size of the formatted article to become 292kb of HTML coding.

If a navbox linked 50 major cities and was used in each of those 50 city-articles, the total wikilinks generated was only 50*50 = 2500 wikilinks, however, if the cities were increased to 200, then the total wikilinks exploded into 200*200= 40,000 total wikilinks. For larger navboxes the problem expands rapidly:

  • For navboxes containing 500 wikilinks to be used in 600 articles, the total becomes: 500*600= 300,000 total wikilinks.
  • For navboxes containing 650 wikilinks to be used in 2500 articles, the total becomes: 650*2500= 1,625,000 (1.6 million) total wikilinks.

Boxifying articles[edit]

Rather than limiting a navbox to the major related topics, some navboxes have become the condensed key contents of an entire article, in a "boxified form" to be appended to another article. Such navboxes are the total opposite of the wikilink concept: details should be kept separate by linking to another article via a single wikilink, rather than repeating portions of that article, again, in the current article. The notion of repeating all major aspects of another article in the boxed form as navbox contents is contrary to the wikilink concept. For example, mentioning that a singer often performed in a famous concert hall requires just one link to that singer's name, not an entire navbox linking that singer's albums, singles, co-singers, songwriters, tours, and TV specials.

Solution: avoid or limit navboxes/infoboxes[edit]

The greatest source of overlinking is in large navboxes or infoboxes used in hundreds of thousands of articles. There are several general ways to limit the impact of those boxes:

  • If possible avoid using navboxes, completely, in articles that are only remotely related to the topic.
  • Link just a few related pages as see-also links, rather than use a large navbox.
  • Use a set of smaller navboxes to cover a topic, and only link to each smaller navbox where directly related, such as cities or counties, but rarely linking both.
  • Emphasize that any major overall navbox should be kept limited in size, to perhaps no more than 200 total wikilinks, recommending smaller navboxes to link specialized sub-topics, not all joined into a single massive navbox.
  • Remove common-word links from navboxes or infoboxes: avoid linking "city" or "county" or "km" or other common words. Readers can type "km" and look it up. By 2009, thousands of common words had been explained with Wikipedia articles, such as: meter, foot, cm, inch, yardstick, township, river, sea, table, chair, good, bad, up, down, sideways, etc. Readers can enter any word in the wiki-search menu to look it up.

Convert large navboxes to navpage-link[edit]

The quickest way to reduce Wikipedia overlinks will be patching each large navbox template to display a one-line navbox with a navpage-link. Unfortunately, show/hide options don't suppress actual page-links. Instead, convert each navbox template to show a one-line navbox with a navpage-link, by adding 5 lines of template coding as a truncated, include-only naxbox. Then, tag the original navbox code with "noinclude" to suppress all the detailed wikilinks, so the new one-line navbox sits atop the suppressed original navbox. Small navboxes should not be changed.
Add 5 lines to the top of each large navbox template:
<includeonly>
{{navbox
| name  = XXX
| title = [[XXX]] &nbsp; [[Template:XXXnavbox|[full navpage] ]]
}}</includeonly><noinclude>
The "XXX" refers to the specific name or title of the navbox.
Near the bottom of the navbox template, put a 6th line "</noinclude>" to indicate skipping all those prior wikilinks when the navbox is appended onto an article page. Only a one-line navbox will then appear, in each affected article, showing the option for "[full navpage]" to display the full navpage.
Any wikilinks inside a template, but not displayed, will not be propagated into the Wikipedia page-link database(s). For example, the boxified Google navbox can be accessed as a one-line box:

If a navbox formerly displayed 430 wikilinks, when used in 200 articles, then 430*200= 86,000 wikilinks will be dropped, a few days after the template is saved, when Wikipedia updates the page-link database(s) for articles back-linked to that saved template. The update is not seen by users, so few readers are aware of the millions of overlinked page-links.

Those are some major ways to limit the growing overlink crisis.

Navbox versus navpage[edit]

Although editors generally try to keep navboxes focused on the most general articles about a topic, a navbox often strays off-topic and gets expanded into over 100 wikilinks. Trying to limit navboxes has often been a slippery slope, and many navboxes have slipped into expanded topics, with each navbox containing more than 100 wikilinks. A large navbox would be better as an entire navpage (rather than just a navbox) with more space to address, perhaps, 200–300 equally important subtopics. A common example would be the 254 counties of Texas, which had been included in a massive Texas navbox template, but were placed instead as 254 wikilinks on separate navpage "Template:Texas_counties" and set an important precedent for a standalone navigation-page. The Texas-counties navpage alone avoided 254*900= 228,600 wikilinks in the first 900 articles about Texas towns. Rather than dragging a large navbox along the bottom of each article, the navpage provides a central menu into subtopics, by a right-click opening into a new window or by browser-backing to the prior navpage display.

Wikipedia can store overlinks[edit]

There is no technical difficulty or potential performance problem with overlinking. Each link takes up a couple of bytes of wikitext in the article storage itself, plus a couple dozen bytes to record the link in the database. While the number of links may increase quadratically, on the order of n×n wikilinks for a given count n of articles, this size is negligible compared to other factors. By comparison, the number of article revisions increases more or less exponentially[clarification needed] with the number of articles, for instance (see Wikipedia:Modelling Wikipedia's growth#Edits per article), and each revision takes up far more storage than a link. If anyone claims that there's some technical problem with having lots of links, please point them firmly toward the page "Wikipedia:Don't worry about performance".

However, some users may find the information overload or clutter of too many wikilinks to be an aesthetic or usability problem.

Analogy of indexing indexes[edit]

There are several analogies that help realize how the overlinking has drastically expanded the total wikilinks:

  • Thinking of each navbox as a small index to related subjects (with the total wikilinks as an index of "What links here"), the wikilinks for each navbox become an "index of indexes" because each navbox is a mini-index of the whole. The total wikilinks are so numerous because they are effectively the index of indexes.
  • If each chapter in the Bible ended with a mini-concordance to related chapters (like a navbox), then the cross-referencing of all chapters would generate a "concordance of concordances" as a massive tome.

Similar analogies illustrate the n-squared problem: if Wikipedia only contained the contents of a single Bible, the massive concordance of concordances would be manageable; however, the cross-referencing of hundreds of thousands of pages has generated a Tower of Babel in wikilinks, with articles overrun by wikilinks to numerous tangent articles.

Why a crisis exists[edit]

The extra millions of wikilinks, generated by navboxes, might at first seem acceptable. However, the situation is a crisis because total wikilinks, formerly at the level of 50 page-links per list, has been growing to 200*200= 40,000 crossed page-links, which represents 40,000 / 50 = 800 times more in total page-links being generated than a page that formerly listed 50 wikilinks. The problem is NOT simply twice the number of links, or 10 times the links used before, but rather the total is effectively becoming 800 times more links.

The problem can be seen when a navbox of 250 box-links, used in 2,000 articles, is reduced to just 10 wikilinks, and the Wikipedia servers (after a few minutes) will pause as the page-link database(s) are updated to unlink 96% (240/250) of the 500,000 total page-links between those 2,001 articles (which had been cross-linked as 250*2000= 500,000 page-links). The situation is a crisis because it is a self-generated resource drain, on a massive scale.

Articles can still be displayed, because, technically, an article can contain more than 4,000 wikilinks. However, when the text only mentioned 150 related articles, then linking to 4,000 articles is a massive increase in wikilinks.

Example: Tramadol used 3,400 navbox wikilinks[edit]

As more navboxes have been created, and older navboxes doubled in size, then many thousands of articles have become mostly navbox wikilinks. For example, by April 2010, the article "Tramadol" had contained 8 bottom navboxes, for a vast array of related medicines, cross-linking to over 3,400 other articles. The main, upper text of the article contained only 200 wikilinks, so those bottom navboxes had contained 3400/200 = 17x times more wikilinks than the actual article text. To reduce all those extra wikilinks, a navpages-box was placed at the bottom of the article, as follows:


Related navpages:

The navpages-box (above) connected to 7 of the navboxes, by template-name only, rather than transclusion of those total ~3,200 wikilinks into the article text. If any of those 7 navboxes were altered, then the article "Tramadol" would not need to be reformatted for them, and the bottom links would directly display the most-recent version of each navbox, as a separate navpage in a browser window.

Wikipedia delaying auto-reformat of navboxes[edit]

In February 2008, Wikipedia would quickly reformat all affected articles, with a delay of only minutes after modifying a shared navbox template, but by December 2008, that delay became days. Back in early 2008, a navbox used in 400 articles could be edited/saved, and all 400 articles would be reformatted, typically within 4 minutes. However, in early 2009, a navbox used in only 20 articles might be delayed days before appearing updated in those 20 articles which used that navbox.

Note that if 6,500 articles were linked to a navpage version of a navbox, then those articles would already be current if that navbox were changed. Changing a navpage-navbox does not affect those articles, because those details are not copied inside each of those 6,500 articles, only the template-name wikilink is copied. Note that internally, the Wikipedia servers might still think that those 6,500 articles need to be scheduled for reformatting, because, technically, the transcluded template was modified, even though the modified parts are skipped by each article. However, from a functional standpoint, those articles would look and act the same before/after reformatting.

If the one-line navpage box were stored as a separate template, which merely wikilinked to the original navbox, then none of those 6,500 articles would be scheduled for reformatting. Note, however, that to skip the continual re-formatting of 6,500 articles is a performance concern (see WP:PERF), so that is not a reason to persuade a person to split the navpage link as a separate template from the navbox template. A person can be advised of the performance difference, but should not be required (according to WP:PERF) simply because an activity is 6,500 times faster. However, in the real world, feel free to consider performance as important.

References[edit]

  1. ^ a b "PCMag.com Encyclopedia". PC Magazine. Retrieved 2007-01-19. 
  2. ^ Dvorak, John C. (April 2002). "Missing Links". PC Magazine. Archived from the original on Dec 23, 2007.