User:Certes/misdirected links
Article titles can be ambiguous. For example, Mercury can mean a chemical element, a planet or a Roman god. Prince is usually a royal title but may refer to the musician. Care is needed to ensure that each link to such topics leads to the right article. This essay discusses finding and correcting links which take the reader to the wrong destination.
Finding pairs
[edit]The first task is to identify pairs of titles where links to the first may be intended for the second. Because of the way titles are disambiguated, many misdirected links take the reader to a page at the base name when a qualified name was intended: to Mercury instead of Mercury (element), or to Apple instead of Apple Inc.
In some cases, the base name is occupied by a disambiguation page or by a redirect to a disambiguation page. Such links are easy to find: the original editor is often notified by User:DPL bot, and they appear on reports such as Disambiguation pages with links. Any incoming links are normally errors and tend to be fixed quickly, sometimes with semi-automated tools such as DisamAssist and Dablinks. Techniques for finding and fixing such problems are already dealt with by WP:WikiProject Disambiguation and will not be discussed further here.
In other cases, the base name is occupied by an article (its primary topic) or by a primary redirect to an article. Most links to that page are correct, making errors harder to find. For example, most links to Prince really are about the royal title, and these false positives need to be eliminated before changing the remaining minority of misdirected links to Prince (musician).
The use of qualified base names suggests a method of finding pairs. There are thousands of pairs of articles (or redirects to articles) where one title is the first word(s) of the other but the list can be refined. We can automatically remove pairs where the short title has few incoming links, on the grounds that a short list of links will contain few errors. We can also disregard pairs where the long title has few links, as its topic is not widely referred to. We can then manually remove obvious false positives. For example, South and South Africa both have many incoming links, but it is unlikely that an editor referring to "South Africa" would accidentally link to "South".
As fixing proceeds, the articles' content may suggest further pairs. For example, links to John Lewis which should lead to John Lewis & Partners can occur in lists of British shops which also link to Iceland rather than Iceland (supermarket).
Finding links
[edit]Having picked a likely pair, we then need to find links which may be in error. Wikipedia's search does this job well. There are two main ways to find links to a shorter title such as Slough (an article about a town) which should lead to a longer one such as Slough (hydrology). Firstly we can look for articles about the longer title:
Slough linksto:Slough hydrology insource:/\[\[ *[Ss]lough *\]\]/
Slough linksto:Slough swamp insource:/\[\[ *[Ss]lough *\]\]/
Secondly, we can look for articles not about the shorter title:
Slough linksto:Slough -Berkshire -England -town insource:/\[\[ *[Ss]lough *\]\]/
The first two terms on each line are almost equivalent to the last one and should hardly affect the output. They are included simply to speed up the search, as insource: alone is very inefficient. Beware that linksto: alone might find many pages which do not link directly but transclude a navigation template with a link. The insource: expression should normally match both sentence and lower case, e.g. [Ss]lough. If only one of the articles is a proper noun then it may also be sensible to do a wider search for just one case. For example,
Slough linksto:Slough insource:/\[\[ *slough *\]\]/
is unlikely to produce false positives, because intentional links to Slough will have a capital S. If there are too many results (say more than 1000), it may be best to limit the search to the intersection of the two sets:
Slough linksto:Slough hydrology -Berkshire -England -town insource:/\[\[ *[Ss]lough *\]\]/
Slough linksto:Slough swamp -Berkshire -England -town insource:/\[\[ *[Ss]lough *\]\]/
It is more efficient to do all searches first and take the union of the outputs (concatenate, then sort eliminating duplicates). In practice, however, further searches for new terms may suggest themselves once fixing is underway.
The morelikethis:
feature can also be useful. The following search finds several hundred links, most of which are correct, but sorts them so that the few which require attention appear near the start:
linksto:prince insource:/\[\[ *Prince *\]\]/ -insource:/\[\[ *Prince *\]\] ([A-Z]|of)/ morelikethis:"Prince (musician)"
Fixing errors
[edit]Errors can be fixed manually, but it is helpful to use a tool such as AWB or JWB. (Both require permission. AWB has more functions but requires Microsoft Windows or a particular Wine setup.) Changes must be made with consideration, as a typical success rate is 50%, i.e. half of the flagged links are false positives which should not be changed.
Typical regular expressions to change piped and unpiped links are:
\[\[ *(Prince) *\| → [[$1 (musician)|
\[\[ *(Prince) *\]\] → [[$1 (musician)|$1]]
$1 here is JWB's notation for the text which matched the first round brackets, i.e. the base name. Use the g flag to change all occcurrences. Consider using the i flag to catch lower case initials, though in this case it is best left off as a quick way of skipping links to the generic term "prince".
We can combine pairs with similar qualifiers, especially if they are likely to occur in the same articles. For example, several terms have specialised meanings in taxonomy:
\[\[ *(family|synonym|tribe) *\| → [[$1 (biology)|
and
\[\[ *(family|synonym|tribe) *\]\] → [[$1 (biology)|$1]]
In this case, the g and i flags are important.
Examples
[edit]This table lists some base names and qualified names which lead to articles on different topics. Some wikilinks to the base name may be intended for the qualified name. These examples have already been fixed, but some errors may have been missed and new ones may appear.
Single-letter titles, especially C and V, deserve a special mention. [[C]] may intend C (programming language). [[C#Anything]] links to a (possibly absent) section of the article about the letter C, so wikitext such as [[C#]] often leads the reader astray. They may intend C Sharp (programming language), C♯ (musical note) or a musical scale such as C-sharp major or C-sharp minor. [[V]] can be a typo for Control-V, meaning that the intended target is whatever title was in the editor's clipboard at the time. Short titles can also indicate gratuitous [[over]]linkin[[g]].
Links with the note "lowercase" can be detected by checking the case of the wikilink. For example, links to [[hamlet]] normally relate to a village rather than the play. "Uppercase" works similarly – links to [[Acre]] usually denote a place rather than the unit – but may occur correctly in a title or to begin a sentence.
Links marked GBB are on User:GoingBatty/Backlinks and are probably checked for new incoming links. Other links are on User:Certes/Backlinks and are checked for new incoming links daily (as of January 2021), except those which produce many false positives and few useful leads.
Places
[edit]Many primary topics share a name with similarly named places. In many cases, there are multiple alternatives; only the most widely linked are listed here.
Place searches
|
---|
"Alexandria" linksto:"Alexandria" insource:/\[\[ *Alexandria *\]\].?.?.?.?.?.?(Virginia|VA|Louisi|LA|United S|US)/ "Athens" linksto:"Athens" insource:/\[\[ *Athens *\]\].?.?.?.?.?.?(Georgia|GA|United S|US)/ "Batman" linksto:"Batman" insource:/\[\[ *Batman *\]\].?.?.?.?.?.?(Turk)/ "Battle" linksto:"Battle" insource:/\[\[ *Battle *\]\].?.?.?.?.?.?(East|Sussex|United K|UK|Engl)/ "Bethlehem" linksto:"Bethlehem" insource:/\[\[ *Bethlehem *\]\].?.?.?.?.?.?(Pennsylvania|PA|United S|US)/ "Birmingham" linksto:"Birmingham" insource:/\[\[ *Birmingham *\]\].?.?.?.?.?.?(Alabama|AL|United S|US)/ "Boston" linksto:"Boston" insource:/\[\[ *Boston *\]\].?.?.?.?.?.?(Linc|United K|UK|Engl)/ "Boulder" linksto:"Boulder" insource:/\[\[ *Boulder *\]\].?.?.?.?.?.?(Colorado|CO|United S|US)/ "Brampton" linksto:"Brampton" insource:/\[\[ *Brampton *\]\].?.?.?.?.?.?(Cumb|Carlisle|Camb)/ "Calvados" linksto:"Calvados" insource:/\[\[ *Calvados *\]\].?.?.?.?.?.?([Dd][eé]p|France|rench)/ "Cambridge" linksto:"Cambridge" insource:/\[\[ *Cambridge *\]\].?.?.?.?.?.?(Massachusetts|MA|United S|US|New E)/ "Canterbury" linksto:"Canterbury" insource:/\[\[ *Canterbury *\]\].?.?.?.?.?.?(New Zealand|NZ)/ "Chester" linksto:"Chester" insource:/\[\[ *Chester *\]\].?.?.?.?.?.?(Pennsylvania|PA|United S|US)/ "Christchurch" linksto:"Christchurch" insource:/\[\[ *Christchurch *\]\].?.?.?.?.?.?(Dorset|Hampshire|Hants|United K|UK|Engl)/ "Cicero" linksto:"Cicero" insource:/\[\[ *Cicero *\]\].?.?.?.?.?.?(Illinois|IL|United S|US)/ "Dollar" linksto:"Dollar" insource:/\[\[ *Dollar *\]\].?.?.?.?.?.?(Clack|Scot|United K|UK)/ "Durango" linksto:"Durango" insource:/\[\[ *Durango *\]\].?.?.?.?.?.?(Biscay|Basque|Spain|Colorado|CO|United S|US)/ "Edmonton" linksto:"Edmonton" insource:/\[\[ *Edmonton *\]\].?.?.?.?.?.?(London|Greater|orth L|United K|UK|Engl)/ "Esplanade" linksto:"Esplanade" insource:/\[\[ *Esplanade *\]\].?.?.?.?.?.?(Kolkata|Calcutta|West B|Bengal|India)/ "Eye" linksto:"Eye" insource:/\[\[ *Eye *\]\].?.?.?.?.?.?(Suffolk|Engl|United K|UK)/ "Flint" linksto:"Flint" insource:/\[\[ *Flint *\]\].?.?.?.?.?.?(Flints|Sir y Fflint|Fflint|Wales|United K|UK)/ "Gladstone" linksto:"Gladstone" insource:/\[\[ *Gladstone *\]\].?.?.?.?.?.?(Queensland|QLD|Australia)/ "Gloucester" linksto:"Gloucester" insource:/\[\[ *Gloucester *\]\].?.?.?.?.?.?(Massachusetts|MA|United S|US)/ "Greenwich" linksto:"Greenwich" insource:/\[\[ *Greenwich *\]\].?.?.?.?.?.?(Connecticut|CT|United S|US)/ "Guna" linksto:"Guna" insource:/\[\[ *Guna *\]\].?.?.?.?.?.?(India|Madhya|Ethiopia|istrict|unction)/ "Hanover" linksto:"Hanover" insource:/\[\[ *Hanover *\]\].?.?.?.?.?.?(New Hampshire|NH|United S|US)/ "Hollywood" linksto:"Hollywood" insource:/\[\[ *Hollywood *\]\].?.?.?.?.?.?(Florida|FL)/ "Horsham" linksto:"Horsham" insource:/\[\[ *Horsham *\]\].?.?.?.?.?.?(Victoria|V|Australia)/ "Hyderabad" linksto:"Hyderabad" insource:/\[\[ *Hyderabad *\]\].?.?.?.?.?.?(Sindh|Pak)/ "Ipswich" linksto:"Ipswich" insource:/\[\[ *Ipswich *\]\].?.?.?.?.?.?(Queensland|Q|Australia)/ "Kansas City" linksto:"Kansas City" insource:/\[\[ *Kansas City *\]\].?.?.?.?.?.?(Missouri|MO)/ "Kansas City" linksto:"Kansas City" insource:/\[\[ *Kansas City *\]\].?.?.?.?.?.?(Kansas|KS)/ "Leek" linksto:"Leek" insource:/\[\[ *Leek *\]\].?.?.?.?.?.?(Staff|Engl|United K|UK)/ "Liverpool" linksto:"Liverpool" insource:/\[\[ *Liverpool *\]\].?.?.?.?.?.?(New South Wales|NSW|Australia)/ "London" linksto:"London" insource:/\[\[ *London *\]\].?.?.?.?.?.?(Ontario|ON|Canad)/ "Loni" linksto:"Loni" insource:/\[\[ *Loni *\]\].?.?.?.?.?.?(Ahmednagar|Maharashtra|India|Bijapur|Karnataka|Ghaziabad|Uttar Pradesh|Punjab|Pakistan)/ "Luxembourg" linksto:"Luxembourg" insource:/\[\[ *Luxembourg *\]\].?.?.?.?.?.?(, \[*Lux|[Cc]ity)/ "Manchester" linksto:"Manchester" insource:/\[\[ *Manchester *\]\].?.?.?.?.?.?(New Hampshire|NH|United S|US)/ "Mansfield" linksto:"Mansfield" insource:/\[\[ *Mansfield *\]\].?.?.?.?.?.?(Ohio|OH|United S|US)/ "March" linksto:"March" insource:/\[\[ *March *\]\].?.?.?.?.?.?(Camb|Engl|United K|UK)/ "Melbourne" linksto:"Melbourne" insource:/\[\[ *Melbourne *\]\].?.?.?.?.?.?(Derby|Engl|United K|UK)/ "Mold" linksto:"Mold" insource:/\[\[ *Mold *\]\].?.?.?.?.?.?(Flints|Sir y Fflint|Fflint|Wales|United K|UK)/ "Naples" linksto:"Naples" insource:/\[\[ *Naples *\]\].?.?.?.?.?.?(Florida|FL|United S|US)/ "New Britain" linksto:"New Britain" insource:/\[\[ *New Britain *\]\].?.?.?.?.?.?(Connecticut|CT|United S|US)/ "New Brunswick" linksto:"New Brunswick" insource:/\[\[ *New Brunswick *\]\].?.?.?.?.?.?(New Jersey|NJ|United S|US)/ "Newfoundland" linksto:"Newfoundland" insource:/\[\[ *Newfoundland *\]\].?.?.?.?.?.?(sland)/ "Norfolk" linksto:"Norfolk" insource:/\[\[ *Norfolk *\]\].?.?.?.?.?.?(Virginia|VA|United S|US)/ "Northampton" linksto:"Northampton" insource:/\[\[ *Northampton *\]\].?.?.?.?.?.?(Massachusetts|MA|United S|US)/ "Norwich" linksto:"Norwich" insource:/\[\[ *Norwich *\]\].?.?.?.?.?.?(Connecticut|CT|United S|US)/ "Odessa" linksto:"Odessa" insource:/\[\[ *Odessa *\]\].?.?.?.?.?.?(Texas|TX|United S|[^R]US[^S])/ "Ore" linksto:"Ore" insource:/\[\[ *Ore *\]\].?.?.?.?.?.?(East|Sussex|Engl|United K|UK)/ "Oxford" linksto:"Oxford" insource:/\[\[ *Oxford *\]\].?.?.?.?.?.?(Ohio|OH|United S|US)/ "Pali" linksto:"Pali" insource:/\[\[ *Pali *\]\].?.?.?.?.?.?(Rajasthan|Rajasthan|India)/ "Perth" linksto:"Perth" insource:/\[\[ *Perth *\]\].?.?.?.?.?.?(Scotland|Perth(s| and K| \& K)|United K|UK)/ "Piedmont" linksto:"Piedmont" insource:/\[\[ *Piedmont *\]\].?.?.?.?.?.?(United S|US)/ "Portsmouth" linksto:"Portsmouth" insource:/\[\[ *Portsmouth *\]\].?.?.?.?.?.?(Virginia|VA|United S|US)/ "Pueblo" linksto:"Pueblo" insource:/\[\[ *Pueblo *\]\].?.?.?.?.?.?(Colorado|CO|United S|US)/ "Punjab" linksto:"Punjab" insource:/\[\[ *Punjab *\]\].?.?.?.?.?.?(India|Pak)/ "Reading" linksto:"Reading" insource:/\[\[ *Reading *\]\].?.?.?.?.?.?(Berk|Engl|United K|UK)/ "Rye" linksto:"Rye" insource:/\[\[ *Rye *\]\].?.?.?.?.?.?(East|Sussex|Engl|United K|UK)/ "Sandwich" linksto:"Sandwich" insource:/\[\[ *Sandwich *\]\].?.?.?.?.?.?(Kent|Engl|United K|UK)/ "Petersburg" linksto:"St. Petersburg" insource:/\[\[ *S[aint.]* Petersburg *\]\].?.?.?.?.?.?(Florida|FL|United S|[^R]US[^S])/ "Surrey" linksto:"Surrey" insource:/\[\[ *Surrey *\]\].?.?.?.?.?.?(British Columbia|BC|Canad)/ "Sydney" linksto:"Sydney" insource:/\[\[ *Sydney *\]\].?.?.?.?.?.?(Nova Scotia|NS[^W]|Canad)/ "Troy" linksto:"Troy" insource:/\[\[ *Troy *\]\].?.?.?.?.?.?(Michigan|MI|New York|NY|United S|US)/ "Warwick" linksto:"Warwick" insource:/\[\[ *Warwick *\]\].?.?.?.?.?.?(Queensland|Q|Australia)/ "Wellington" linksto:"Wellington" insource:/\[\[ *Wellington *\]\].?.?.?.?.?.?(Somerset|Shrop|Salop)/ "Wellington" linksto:"Wellington" insource:/Duke of \[\[ *Wellington *\]\]/ "York" linksto:"York" insource:/\[\[ *York *\]\].?.?.?.?.?.?(Pennsylvania|PA|United S|US)/ |
Sports teams
[edit]Many primary topics share a name with similarly named sports teams. In many cases, there are multiple alternatives; only the most widely linked are listed here. All can usefully be limited to uppercase search: links to bears can be assumed to refer to mammals, etc. Teams where the likely bad target is a dab such as Jets are not listed; nor are redirects to the team such as Lakers, even where other meanings exist. Short names marked * are shared by multiple teams.
Also beware of stray positions such as back and wing. Links to towns and cities (Watford 1:2 Liverpool) also occur but are harder to detect.
Surnames
[edit]Surname pages, despite being a list of topics which the article title might mean, are articles rather disambiguation pages, so incoming links are not reported as errors. Here are the 100 surnames which required the most fixes in April 2020. It is sorted by link count but can be sorted alphabetically by clicking the header.
These commonly linked surname articles are generally linked correctly, as they also describe the family or another homonymous topic:
- Abashidze, Baig, Bhatt→Bhat, Boncompagni, Bowes-Lyon, Chaudhary, Chowdhury, de Burgh, de Graeff, Desai, Dhillon, Doyen, Drost, van Eyck, Kardashian, Khwaja, Khawaja, Liu, McGovern, Mortimer, Murong, Naidu, Niazi, Ó Cléirigh, O'Rourke, O'Sullivan, Oswal, Patel, Pawar→Pawar (surname), Piccolomini, Qureshi, Reventlow, Sandhu, Sharma, Tyagi, Ungern-Sternberg, Wright.
Other productive changes outwith the top 100:
Works of art
[edit]Titles of works of art (broadly construed) often appear in italics. Qualified titles with many incoming links which do not have a dab or work of art at the base name may attract misdirected links. For example, ''[[Abraham Lincoln]]''
may refer to Abraham Lincoln (1930 film). These can be found thus:
- Run a Quarry query to list suspicious cases, one initial at a time.
- Search for links to the page at the base name which are in italics and may be intended for the qualified name, e.g.
linksto:"Abraham Lincoln" insource:/[^']''\[\[Abraham Lincoln\]/
(the [^'] preventing bold text from matching).
A similar procedure can identify songs, etc. expected to appear in quotes.
Ephemera
[edit]User:HostBot/Top 1000 report shows Wikipedia's most visited pages, some of which enjoy temporary popularity. As of December 2020, the following entries from the top 150 have an unrelated topic at the base name:
Base name | Topic | Likely target(s) | Comment |
---|---|---|---|
Redirect to Startup company | Start-Up (South Korean TV series) | Retargeted to dab | |
The Crown | The state in the Commonwealth | The Crown (TV series) | Fixed |
The Undoing | Album by Steffany Gretzinger | The Undoing (miniseries) | Page move pending |
Virgin River | Colorado River tributary | Virgin River (TV series) | No bad links |
See also the historic monthly top 100s in Topviews.
Current and future work
[edit]Set index articles
[edit]Many set index articles have incoming links which could be improved. Roughly A–F checked and fixed; currently checking the widely linked Ministry of Finance.
Given names with dabs
[edit]Articles with template {{given name}} having a corresponding X (disambiguation) page (not a redirect). This is proving fruitful: 100+ fixes for A alone.
Former dabs
[edit]Category:Former disambiguation pages converted to set index articles, some of which overlap with other groups here. Roughly A–B checked and fixed.