Wikipedia / LC Name Authority matching - earlier effort[edit]

An earlier effort to match Wikipedia biographical articles with the LC name authority file used the List of people by name on WIkipedia to produce the tables linked to below. In these tables, a match with the LC name authority file was only returned if it was a unique match for the name (in which case it was labelled ExactMatch) or for the name and at least one birth/death date (in which case it was labelled ExactMatchWithDates). Between a third and a half the people in Wikipedia appeared to be matchable to the LC Name Authority file. For example, of the 191 people with names beginning with Z, 93 (just under 50%) were found.

There were several caveats with these tables:

  1. The list of people on Wikipedia I used was not quite up to date: it was taken six months ago from a mirror site copy of List of people by name, after I had problems parsing Wikipedia's XML download. This list was incomplete: in fact, it only included around 15% of those categorized by date of birth.
  2. Unicode wrinkles meant that the LC recommended name form column in the tables had a little unicode garbage in it
  3. Within each table, names were mostly in alphabetic order. They were, in particular, mostly locally in alphabetic order. But some were not quite - e.g the table for W.
  4. Recent changes in the OCLC name authority service meant a little loss of flexibility in date search

Dsp13 12:09, 27 July 2006 (UTC)

A B C D E F G H I J-manually corrected K L M N O P Q R S T U V W X Y Z

Dsp13 02:43, 17 July 2006 (UTC)

Earlier effort at automatic matching: manual check of names starting with J[edit]

I manually checked the results for the case of names beginning with J. (Since there are plenty of common surnames beginning with J - Jackson, Johnson, Jones etc. - I do not think that this is an unrepresentatively easy sample.)

Out of 640 names taken from the List of people by name (see caveats above), 357 (or 56%) were automatically matched against the LC name authority file. Manual checking of the matches found was a little disappointing: 42 (or 12%) were erroneous.

I marked manual changes in the corrected table for J, although this corrected table does not show the names which needed to be manually removed because in fact the people were not in the LC Name Authority File at all.

The 42 erroneous matches are displayed in the table below. There are three kinds of error, requiring three corresponding kinds of manual repair:

  1. Disambiguation. The need for this arises when the name in the List of people by name does not point to an individual biographical page but only to a disambiguation page. (This is essentially an internal Wikipedia problem, rather than a problem arising from record linkage per se. Such cases could be automatically removed from the table by removing links to disambiguation pages.)
  2. Correction. The need for this arises when a LC name authority record for the individual concerned exists, but the automated matching finds another one.
  3. Removal. The need for this arises when there is no true LC name authority record for the individual concerned (although the automated matching has mistakenly found one.)

ExactMatchesWithDates are far more reliable than Exact Matches: of the 198 searches which produced Exact MatchesWithDates, there are only 4 (2%) errors produced. Three of these errors are disambiguation errors; if these are taken out of account, we have a very respectable 0.05% error rate.

People in Wikipedia erroneously matched with LC name authority file - names begining with J
Wikipedia link mismatched NACO external link LC recommended name form Titles in LC Dates in Wikipedia NACO match category Manual repair needed
Al Jackson n85-89162 Jackson, Al. 43 2 ExactMatches Removal
Cyril Jackson no2005-62233 Jackson, Cyril 0 2 ExactMatches Removal
Jonathan Jackson (football) no2005-84695 Jackson, Jonathan, 1982- 0 1 ExactMatchesWithDates Removal
Russ Jackson n2001-108420 Jackson, Russ, 1967- 2 0 ExactMatches Removal
Werner Jaeger n96-41010 Jaeger, Werner 24 2 ExactMatches Correction
Nemi Chand Jain no96-3205 Jain, Nemi Chand 0 1 ExactMatches Removal
Cornelius Jansen nb98-76099 Jansen, Cornelius, 1822-1894 14 2 ExactMatches Correction
Frank Jarvis nr88-1184 Jarvis, Frank. 9 2 ExactMatches Removal
John Jellicoe nr93-8061 Jellicoe, John 10 2 ExactMatches Correction
Charles Francis Jenkins n90-601696 Jenkins, Charles Francis, 1865-1951 99 2 ExactMatches Correction
Charlie Jenkins n90-605857 Jenkins, Charles Lamont 4 1 ExactMatches Removal
Conway Twitty n50-28071 Jenkins, Harold. 82 0 ExactMatches Correction
John Jenkins n82-27775 Jenkins, John, 1592-1678 242 2 ExactMatchesWithDates Disambiguation
Pamela Jenkins n95-72244 Jenkins, Pamela 35 0 ExactMatches Removal
Pat Jennings n96-40034 Jennings, Pat 25 1 ExactMatches Removal
Otto Jespersen (comedian) n79-96864 Jespersen, Otto, 1860-1943 487 0 ExactMatches Removal
Jesus n2003-60204 Jesus, son of Ananias, d. 70 3 0 ExactMatches Correction
Maria Johansson no92-7260 Johansson, Maria 11 1 ExactMatches Removal
John Chrysostom nb99-29377 Chrysostom, John, Brother, 1863-1917 50 2 ExactMatches Removal
John Bosco no2005-40758 Bosco, John, 1975- 0 2 ExactMatches Corrected
John Chrysostom nb99-29377 Chrysostom, John, Brother, 1863-1917 50 2 ExactMatches Correction
Daniel Johns n2005-92366 Johns, Daniel 0 1 ExactMatches Removal
Abigail Johnson n2004-90670 Johnson, Abigail 0 1 ExactMatches Removal
Amy Johnson n88-73874 Johnson, Amy 74 2 ExactMatches Removal
Avery Johnson n2005-30317 Johnson, Avery, 1906-1990 0 0 ExactMatches Removal
John Henry Johnson no98-101627 Johnson, John Henry 34 1 ExactMatches Removal
Lionel Johnson n85-310423 Johnson, Lionel, 1924- 12 0 ExactMatches
Lucius E. Johnson n84-123216 Johnson, Lucius E. (Lucius Elsworth), 1905- 2 2 ExactMatches Removal
Paul Marshall Johnson, Jr. n87-146872 Johnson, Paul, 1955- 12 1 ExactMatchesWithDates Removal
Stephen C. Johnson n85-247247 Johnson, Steven C. 10 0 ExactMatches Correction
Lynn Johnston n81-123672 Johnston, Lynn. 17 0 ExactMatches Correction
Hermann Jónasson n97-801126 Jonasson, Hermann 2 2 ExactMatches Correction
Bobby Jones n80-13004 Jones, Bobby, 1902-1971 176 2 ExactMatchesWithDates Disambiguation
Dean Jones (cricketer) n82-66616 Jones, Dean 214 0 ExactMatches Removal
Eva Jones n81-40482 Jones, Eva. 25 1 ExactMatches
Steve Jones (rock musician) nb2005-18105 Jones, Steve, 1955- 0 1 ExactMatchesWithDates Corrected
Thomas D. Jones n97-83706 Jones, Thomas David, 1952- 13 1 ExactMatches Correction
Jeremy Jordan no2001-83102 Jordan, Jeremy, 1973- 0 1 ExactMatchesWithDates Disambiguation
James R. Jordan n85-248248 Jordan, James R. (James Reilly), b. 1866. 13 1 ExactMatches Removal
Terry Jordan nr90-20082 Jordan, Terry 29 0 ExactMatches Removal
Joseph Cotter n2002-158245 Cotter, Joseph, 1956- 0 0 ExactMatches Removal
Hubert Julian no00-20189 Julian, Hubert 4 1 ExactMatches Correction
Carl Jung no2005-86644 Jung, Carl 0 2 ExactMatches Correction

Moved from User:Dsp13 by Dsp13 19:42, 9 August 2006 (UTC)