Wikipedia:Link rot
Like almost all large websites, Wikipedia also suffers from the phenomenon known as link rot, where external links go stale after a period of time. As of the September 13, 2005 database dump, Wikipedia contained 845,416 external links, many of which are no longer functioning.
Such dead links are unwanted, and should be fixed on a regular basis. You can either try to find the current location of the document using a Google inurl search, or use the {{dlw}} template to point to the Internet Archive version of the document. Please do not simply remove dead links; they contain valuable information.
This page is intended to be a clearinghouse for all such external links. If you make corrections to the source article to fix a broken link, please indicate so below to prevent a duplication of effort.
Although the sections below contain a short description of the status code in question, please see the list of HTTP status codes for a more complete description.
Status codes
200
The 200 status code indicates that the link is correctly formed, and retrievable. Although such links do not need correction, they are included here for completeness. Wikipedia currently contains 704,147 of these links. Due to the sheer number of links that correctly resolve, these are not available for download.
300
Indicates that the website requested more information from the bot so that it could make an appropriate presentation of the content. Although such links are most likely correct, they should probably be double checked. Wikipedia currently contains 36 of these links.
301
Indicates that the content has been moved permanently, and that the link inside Wikipedia should probably be updated to reflect the new location. Wikipedia currently contains 21,538 of these links.
302, 303, 307
Indicates that the content has been temporarily moved, and that the client should continue to use the original link. Although these links should be correct in theory, they are often used by link farms, and should probably be checked. Wikipedia currently contains 46,767 status 302 links, 198 status 303 links, and 6 status 307 links.
400
Indicates that the site in question could not understand the bot's request. Although these should hopefully diminish with future revisions of the bot, it may be useful to test them anyways (low priority). Wikipedia currently contains 1,205 of these links.
401
The page required authorization, which the bot does not support. The page in question may have included login information, the bot has no way of knowing this. Such links should be fixed if the page does not contain login information. Wikipedia currently contains 210 of these links.
402
Although not an active status code, the servers used it anyways. It indicates that the server requested payment (in theory) from the client. Such links should be fixed. Wikipedia currently contains 0 of these links.
404, 410
The 404 error is the most common symptom of link rot, and it indicates that the page has not been found. The 410 status code is similar, but indicates that the server doesn't know whether the situation is permanent or not. Such links must be fixed, perhaps with a link to the Internet Archive. Wikipedia currently contains 24,012 status 404 links and 31 status 410 links.
405
Indicates that the bot request was of a method not allowed. Since regular Wikipedia links are of the HTTP variety (which the bot uses), these links are probably broken and should be fixed. Wikipedia currently contains 0 of these links.
406
Occurs for a number of reasons, indicates that the client request was unacceptable in some manner. Should probably be fixed. Wikipedia currently contains 94 of these links.
409
Indicates some sort of error that the client needs to resolve. Should probably be fixed. Wikipedia currently contains 0 of these links.
412
Indicates that the request failed to meet some sort of precondition. Should probably be tested. Wikipedia currently contains 0 of these links.
423
Although not an active status code, servers use it to indicate some sort of "Locked" error. Should probably be fixed. Wikipedia currently contains 0 of these links.
425
Another non-active status code. Although the bot was not mirroring their content, it indicates that the server denied the request due to it being a "mirroring" request. Should probably be tested. Wikipedia currently contains 21 of these links.
5xx
Indicates there was some sort of internal server error. This could be the result of a malformed bot HTTP request, or numerous other reasons. Should be examined to determine whether the site is suffering from some sort of permanent problem with the link in question. Wikipedia currently contains 4,269 status 500 links, 10 status 501 links, 135 status 502 links, and 193 status 503 links.
NA - Unsupported protocol
Indicates that the link was used a protocol such as IRC, Gopher, etc. that the bot is not capable of resolving. Should be checked as to whether the resource type is correct (eg, htttp://www.wikipedia.org). Wikipedia currently contains 171 of these links.
NA - Unknown error
Indicates that the had some sort of difficulty resolving the link in question. Could be caused by a number of errors: DNS lookup failures, socket timeouts, etc. The default socket timeout was set to 30 seconds, which may be too low for some very slow sites. Should probably be tested. Wikipedia currently contains 39,749 of these links.
Downloads
Below are links to download tab separated text files (gzip compressed) containing the links. They are in the form:
Article title, tab, URL, tab, further description (as in [http://www.wikipedia.org/ Wikipedia] links), tab, error code, tab, server response. These should probably be located to somewhere more permanent in the future.
200 (not currently available)
300On Wiki - 301 - 302 - 303 - 307
400 - 401 - 402 - 404 - 405 - 406 - 409 - 410 - 412 - 423 - 425
NA (Unsupported protocol) - NA (Unknown error)
The 404 errors have pages to themselves:
- a, 1782 entries
- b, 1246 entries
- c, 1699 entries
- d, 985 entries
- e, 739 entries
- f, 883 entries
- g, 773 entries
- h, 889 entries
- i, 642 entries
- j, 1366 entries
- k, 513 entries
- l, 1468 entries
- misc, 539 entries
- m, 2007 entries
- n, 951 entries
- o, 538 entries
- p, 1163 entries
q, 80 entriesdone- r, 894 entries
- s, 1838 entries
- t, 1299 entries
- u, 460 entries
- v, 303 entries
- w, 681 entries
x, 41 entrieslooks done to me- y, 119 entries
- z, 95 entries
Status
Please indicate your correction status in the form "123: ABC - XYZ", eg, "404: African Academy of Sciences - anonymous remailer"
300: None
301: Agesilaus II - Cough CPR, All Too Flat, Alvito, Boto, Eight Crazy Nights, Family First Party, Fine motor skill, Gabrielle Pizzi, Mighty Mohawk Man, Zane, ZZ Top
302: None
303: None
307: None
400: None
401: None
402: Kayo Hatta. All fixed!
404: Controlled Combustion Engine, Neptunium - Netherlands, P*U*L*S*E - Plain English, Punjab Regiment (Pakistan) - Père Noël, Z M Dagar - Zadar, X2 (film), XML transformation language, Court of Session Working on 'C' starting @ the top
405: All links work correctly, despite 405 error. Probably a web-server configuration issue.
406: A Special Sesame Street Christmas - Bay County, Florida. Links either worked anyway or have been fixed.
409: Court of Session - Self-hatred. All fixed!
410: 2004 U.S. election voting controversies, Ohio - Publieke Omroep
412: All links tested, work correctly.
423: Natib Qadish - The Leopard Man. All fixed!
425: None
500: None
501: None
502: None
503: A-D articles done--Adam 16:41, 3 December 2005 (UTC)
NA (Unsupported protocol): Fixed numerous links with typos. Everything else "looks" correct.
NA (Unknown error): None