Wikipedia talk:WikiProject Red Link Recovery/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

→

Archive 5

Before posting questions here, rember to check the FAQ in case they are answered there already.

A more intelligent search engine?

Not sure how to list this on the main page, so here's my discussion idea: How about some sort of semi-intelligent search engine on Wikipedia that will find alternative spellings or a part of a searched phrase, or suggest alternate spellings, etc? For example, "Breslov" (a group within Hasidic Judaism) is also spelled "Breslev" and "Breslav" because it is originally Hebrew, which has a different alphabet. Similarly, "Rabbi Nachman of Breslov" is the same person as "Nachman of Breslov" or "Rebbe Nachman" -- all three of which might be used by searchers. Plus, Yiddish and modern Hebrew differ in pronunciations of the same thing, hence differing transliterations. For example, a yechidus is the same thing as a yechidut. (No page on that yet -- I plan to create one.) I've been working on Hasidic Judaism pages and have had a heck of a time trying to figure out how Wikipedia is spelling Hebrew and Yiddish terms and names and whether or not an appropriate page already exists. rooster613

If there are any rules that can be derived I can certainly pull out lists of suggested changes to links. For example if "Rabbe Nachman" and "Rabbi Nachman of Breslov" are used inconsistently, I can detect red links to "Rabbi Nachman of Breslov" and suggest "Rabbe Nachman" as a link. If you have trouble picking out rules, I'm happy to run a set of examples through a pattern matcher and see if it produces anything useful. - TB 22:27, 2005 Jun 23 (UTC)

I would like to implement some type of nearest-neighbour matching, and query-reformulation in the URL's of Wikipedia to match, the fact of what happens to wikipedia search terms. It derives from ideas of search engines that look at user querys and match the terms to what other users spelt in addition to the correction lists, which we may already have. This POV, has the idea that, people will 90-95% of the time type correct URL's as they want to retrieve the right pages, and wont waste time. Thus we may capitalize of Wikipedias query logs to check if a particular search term is already present in the query-logs, [meaning its right!], or if a closely matching words are found in the logs, we may present them as suggestions to the user [like Googles suggestion]. I have posted a bug on this, at MediaWiki servers, and doing background research on this. Muthu CDT 9:47.00pm Oct 20th 2005.

There's no such spelling as "Rabbe" -- the word is "Rebbe" -- with an "RE" not an "RA." A rabbi is a scholar of Jewish law and teachings. A Rebbe is a charismatic saintly leader of a group of Hasidic Jews. These words are usually not a problem. But if you can disambiguate "Rabbi Nachman of Breslov", "Rabbi Nachman" and "Rebbe Nachman" and point them to the Nachman of Breslov page, it would be a great help. Ditto with pointing "Breslev" and "Bratzlav" to the Breslov (Hasidic dynasty). (Although Bratzlav is also a town in Germany.) Also, "Reb Nosson" and "Nosson of Nemirov" should point to Nathan of Nemirov. Thank you! rooster613

How often are the lists updated?

Just wondering if there is any point in going out of my way to remove listed links that are no longer red (a list update will fix those anyway right?). I did remove some today, but then it ocured to me that while cleaning the list is well and good that hour could have been better spent fixing actual red links instead. --Sherool 28 June 2005 15:50 (UTC)

I'd hope to regenerate the lists every month or two, time and database dumps depending. The current lists are basedon the 15th May database dump, so they're at least 40 days out of date - the chances are that a number of entries have already been fixed in this time, these'll be the one's you're seeing. All in all it's probably not worth deleting them unless you're editing a section anyway. Do however mark up false positives - I'll filter these out of future editions of the reports to save everyone time and effort. - TB June 28, 2005 21:33 (UTC)

Regarding the numeric list

How about implementing roman numerals into this list at some point. Granted I haven't done any research, but I would imagune there are a number of mis-spelled links where people have user roman numerals instead of regular numbers (or written numbers) or vice versa. Can apply to anyting from game and movie titles (Doom II <-> Doom 2, Episode IV <-> Episode Four etc), to Royalty and Popes (John Paul II <-> John Paul the second etc etc.), or even Olympic games. Does add a fair bit of complexity to the code though... --Sherool 1 July 2005 10:17 (UTC)

An excellent idea - I'll give it a go and see what comes out. - TB July 1, 2005 13:46 (UTC)

Redirects

I realise this is almost certainly a lesson in sucking eggs but it seems to me that the fastest way to turn red links blue is to make as many appropriate redirects as possible. One good redirect can turn a whole stack of red links blue without the need to individually edit each one.

That's a very good approach. These current links won't be the last ones to use the red link instead of the (only slightly different) blue link. - Tεx τ urε 20:25, 13 July 2005 (UTC)

TLAs

What are we doing with pages such as TLAs from AAA to DZZ? These pages are designed, it seems, to utilise the red links, and I can't see a method of removing them that won't cause more problems. --me_and 06:48, 14 July 2005 (UTC)

I think they should be left alone. As you say, they contain red links for a reason. Those are some of the few pages where red links are a good thing. – Quadell ^{(talk) (sleuth)} 13:55, July 14, 2005 (UTC)

French names nightmare :)

Hi. I've a suggestion : one should make exceptions for some pages related to French names. Take a look at Communes of the Gironde département. I guess this list of place name has been copied from somewhere like a gouvernemental site and may not contain common typing mistakes. For etymological reasons, a huge lot of place names in French do have an ending s and many of them contain common word. Take the place name Coutures for instance : it has probably very few to do with couture. As a very lot of "plural suggestions" are those in "Communes of the XXX departement", I would suggest to not check these pages next time. French names are a nigthmare, even for French people like me : many time I have no idea on how to prononce them ! gbog 04:54, 15 July 2005 (UTC)

I agree, in this case the suggestions are, in my experience, 0% effective. These lists just have a TON of links, and there are correspondingly a TON of little French villages without Wikipedia articles. (Note that I'm working through them anyway, because I have way too much free time ^^; So they'll probably end up on the exception list eventually) Junkyard prince 05:01, 31 July 2005 (UTC)

tip

FYI- the "tabbed browsing" feature of the Mozilla, Fire Fox browser makes this really fast. Stainless steel 18:48, 27 July 2005 (UTC)

Yep, tabs are neat. Perfer Opera myself though, its "notes" feature is also extremely hepfull when editing, I use it to manage often used edit summaries, templates, categories and such. Way better than copying and pasting from some external document or whatever, just right click → insert note, and pick the one you want. I'm sure there is a FireFox extention for simmilar functionality too though. --Sherool 00:18, 22 August 2005 (UTC)

How successful were we?

This project's current iteration seems to be coming to a completion, with most sections done, Part 6 of the Pluralisation section likely to be finished within mere minutes. How successful was this effort? What percentage of suggestions was struck through, and what percentage was not? NatusRoma 05:15, August 11, 2005 (UTC)

Indeed, it would be nice to see some stats here now that the project is finished. -- Rune Welsh ταλκ 12:56, August 21, 2005 (UTC)

I've not been the most active participant, but if memory serves I think capitalisation and the numerical lists worked out quite well, while the pluralisation list resulted in a significant number of exceptions. At least that's my impression. But it depends on how you measure sucess now doesn't it? I don't see an exception as a "failure", if no relevant article exist then there is nothing to fix, and someome might eventualy make an article to fill the "gap". As I understand the main aim was to fix links that cold be made to point at existing articles and I think we did pretty good in that regard. Though naturaly between the database dump the lists where generated from and now there are probably a million new red links waiting to be fixed :P --Sherool 00:33, 22 August 2005 (UTC)

I've yet to compile accurate stats, but believe that more than 50,000 red links were recovered in this iteration of the project. A hearty well done to us all! The automatically generated suggestions were on average about 90% correct. Of course, this was a first pass and concentrated on all the easy wins - hopefully the list of 5000 exceptions will help keep the quality of the next iteration up. - TB 22:08, August 22, 2005 (UTC)

Any clue when we'll get the next batch?

I'm glad that we appear done for now...but anyone know when they'll be ready for us to go back at it again?

--Kell 23:52, 14 September 2005 (UTC)

Topbanana seems to be on a haitus or something (no edits since early September). Maybe someome else could cobble together some lists, the script and exception lists are all available on subpages here if I'm not mistaken. --Sherool 18:10, 22 September 2005 (UTC)

I just tried to do it. However, it seems that the June 23 database dumps of the two dump files we need are the most recent files available from the Wikimedia download site. So we're working with the most recent information right now, it seems.

Plus, it looks like they've started using XML for one thing or another, making the directions given on the subpage of this WikiProject completely useless. Shame.

–ArmadniGeneral (talk • contribs) 20:15, 2 October 2005 (UTC)

Yes, I'm on haitus (usual story .. got married .. business took off dramatically .. moved to a non-broadband house .. first child on the way .. etc etc). Apologies all who are awaiting the next round of this project. If anyone has been able to get a recent database dump downloaded and available in mysql and is willing to have a go at generating more suggestions, give me a shout and I'll try to lend what assistance I can. - TB 22:01, 17 November 2005 (UTC)