Jump to content

Wikipedia:Village pump (idea lab): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
RFC bot (talk | contribs)
Removing expired RFC template.
Line 329: Line 329:


Charity Engine usually charges for distributed storage and processing on its grid, but strongly supports the goals of the Wikimedia Foundation and will be providing the backup for free. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/87.112.26.249|87.112.26.249]] ([[User talk:87.112.26.249|talk]]) 01:36, 1 February 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
Charity Engine usually charges for distributed storage and processing on its grid, but strongly supports the goals of the Wikimedia Foundation and will be providing the backup for free. <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/87.112.26.249|87.112.26.249]] ([[User talk:87.112.26.249|talk]]) 01:36, 1 February 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->

== Biographical metadata ==

:<small>Originally posted at a template for discussion thread, but was off-topic there so posted here.</small>
I'm making a plea here for help in improving the organisation of the listings of biographical articles. Several years ago now, I said it would be nice to be able to generate a single master database of all biographical articles on Wikipedia. That would help tremendously in updating both human name disambiguation pages (e.g. {{tl|hndis}}) and human surname [[WP:SETINDEX|set index]] pages such as [[Fisher (surname)]] (see {{tl|surname}}). For an example of the former, see the update I made [http://en.wikipedia.org/w/index.php?title=Paul_Fischer&diff=prev&oldid=471800198 here] at [[Paul Fischer]]. I had been looking for information on that [[Paul Henri Fischer]] (without knowing his middle name) and though I knew his birth and death years and found his article that way, I had to add him to the human name disambiguation page myself. The point here is that I'm not aware of any systematic effort to keep such pages updated. It is not a trivial proposition (those with long memories will remember the massive [[Wikipedia:Miscellany for deletion/List of people by name|lists of people by name that got deleted]]), but could be automated or semi-automated if the following was done:
*(1) Identify all existing biographical articles (i.e. ones about a single person's life story) and tag them accordingly. This would involve separating out the 'biographical' articles tagged by [[Wikipedia:WikiProject Biography|WikiProject Biography]] that are in fact group biographies (such as articles about music groups, families, siblings, saint pairs, and so on). Those group biographies will still contain [[Wikipedia:Biographical metadata|biographical metadata]], but need to include a 'group biography' tag. Not sure how to handle cases where a person's name is a redirect (these are not common, but are not rare either).
*(2) Ensure all such articles are accurately tagged with [[WP:DEFAULTSORT|DEFAULTSORT]] or some other 'surname' parameter (with the usual caveats about needing to be aware of guidelines in this area and correctly identifying what is the 'surname', which is not always easy and varies around the world, and how to treat people with only one name, and so on).
*(3) Generate the masterlist/database to list all biographical metadata, including all data present in the [[WP:INFOBOX|infobox]], in the categories, in the DEFAULTSORT tag, and in the [[WP:Persondata|Persondata]] template. This is the point where the data can be compared and cleaned up if necessary. But for now, the data of interest is the name.
*(4) Generate a similar database for set index and human name disambiguation pages such as [[Fisher (surname)]] and [[Paul Fisher]] (different spelling to the one above, which brings up a slight problem in that some alternative spellings are rightly bundled together on one page, and some are not - this may make machine-identification of the right set index pages harder, but not impossible). Also, some are of the form "name (disambiguaton)" or "surname (surname)" or "surname (name)", and that can change over time as people move pages around, but there should be a non-trivial way to address this.
*(5) From the alphabetical listing of all the biographical articles, identify lists of those with the same name and ensure the corresponding surname set index pages and human disambiguation name pages (if they exist) are updated at regular intervals, possibly by bot talk page notification with a list provided by the bot. The bot could generate suggested lists using a combination of the article title (for linking purpose), and the Persondata name, birth year, death year, and short description fields. I think a project took place at one time to keep set index name pages updated, and that might have used bots to generate lists, but I can't remember where that project was, how successful it was, and if it is still going (update: I was thinking of [http://en.wikipedia.org/w/index.php?title=Wikipedia:Suggestions_for_disambiguation_repair&oldid=221716512 this] from 2008: ''"22,743 suggested surname disambiguation pages, created [...] from the May 24, 2008 database dump"'').
*(6) Ideally, such a biographical listing of all biographical articles (now approaching 1 million) would be done dynamically by a category listing. But there is no single category for this as yet. The closest ones are the [[:Category:Living people|category for articles on living people]] (555,778 articles at present) and the listing of articles tagged by WikiProject Biography (which is a listing of the talk pages only). It is possible to generate partial set index names pages using the 'living people' category (e.g. [http://en.wikipedia.org/w/index.php?title=Category:Living_people&from=Rabe surname Rabe] (currently 14 people) can be compared with [[Rabe]] which only lists 12 people, of whom three are dead and one is a redirect), but this only puts those querying the category at the start of any dynamic 'list' of people by name and doesn't take into account biographies of historical (dead) people.
Would those reading this be able to say how feasible the above is, what work has already been done or is being done, and what would need to be done to get to the stage where we can be confident that our set index pages and human name disambiguation pages are accurate and updated at regular intervals to stay accurate? Or suggest which places I should go to to see who else might be interested in helping with this sort of thing? [[User:Carcharoth|Carcharoth]] ([[User talk:Carcharoth|talk]]) 23:52, 4 February 2012 (UTC)

Revision as of 23:52, 4 February 2012

 Policy Technical Proposals Idea lab WMF Miscellaneous 
The idea lab section of the village pump is a place where new ideas or suggestions on general Wikipedia issues can be incubated, for later submission for consensus discussion at Village pump (proposals). Try to be creative and positive when commenting on ideas.
Before creating a new section, please note:

Before commenting, note:

« Archives, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61

The aim of the Village pump (idea lab) is to encourage the preliminary incubation of new ideas in a "non-polling" environment. When you have a new idea, it is not mandatory that you post it here first. However, doing so can be useful if you only have a general conception of what you want to see implemented, and would like the community's assistance in devising the specifics. Once ideas have been developed, they can be presented to the community for consensus discussion at Wikipedia:Village pump (proposals).

The formation of this page, and the question of its purpose and existence, are the subjects of discussion on the talk page. Direct all comments on those topics there.


Automatically updated cited information

It strikes me that there is a type of info on Wikipedia that is in constant need of updating. It is well referenced data. Such as, League tables, numbers of things in the place, amounts of stuff where it is etc. Statistics if you will.

Technology exists to grab such info direct from the sources that are cited on each page load. Either cURL or Ajax are obvious solutions (salivating over the idea of Web Sockets and constantly updated pages), that could be simply implemented, if only there were some way to embed the scripting in the page. Well, we have templates, and they can accept parameters. So all we then need is for certain types of template to be parsed differently with the result that wherever that template is, the source that cites the data contained in it, is checked, and the up-to-date data returned. All that can happen easily and quickly while the other scores of calls to scripts and stylesheets are being made. The result would be far fewer pointless edits to state that some minor detail has changed, when we know full well that detail will change again in a week, and again the week after (ad infinitum). Also, all the data that can be updated in this fashion, will always be up-to-date, making Wikipedia a more reliable source.

Remember this is the idea lab. fredgandt 21:43, 5 January 2012 (UTC)[reply]

I would fully support this, and would go much further. Bottom line: the day will come (say 7-10 years at the current rate) when some of Wikipedia will be a smaller form of Wolfram Alpha. Updating text files is old technology. And that will solve some of the consistency issues. E.g. consider Tokyo Narita airport and Paris de Gaulle Airport vs World's busiest airports by passenger traffic. I have not checked now, but they are usually inconsistent. The same idea applies to "list of rivers in England", etc. For further elaboration, please see some ideas I typed a while ago but have not worked on recently. History2007 (talk) 22:17, 5 January 2012 (UTC)[reply]
Yup! Although since I had to look up "Wolfram Alpha", it's fair to say that tech is less than likely to be adopted in the time-frame you propose. I'm fairly sure Wikia is already using Node.js and Web Sockets. But lets face it, not many sites are fully exploiting JavaScript yet. The web has so many treasures but seemingly little direction. "Internet Explorer" (need I say more?) (maybe). If Microsoft can't make a standards compatible browser, there is little hope for the web at large to adopt cutting edge tech en-mass. We need to stick with the simple, explored, entrenched, understood, and widely used tech, if we are going to push for new features. fredgandt 22:36, 5 January 2012 (UTC)[reply]
Actually "Wolfram Alpha" is not a new idea at all. Question answering systems are much older than that. But please do not take the Question answering Wiki-article too seriously - I just looked it up and it is full of errors (sigh.... I tagged it for a rewrite). But what technology is used be it one script or another is beside the point. First there needs to be a decision to do this right. History2007 (talk) 23:01, 5 January 2012 (UTC)[reply]
Is that deja vu I'm experiencing? I think it is. Anyway, with this sort of change (massive; taking the reigns away from editors and handing some responsibility over to scripts), I think a slow progression through from "Is this possible?" to "do we like it?" to "where will we use it?" back to "do we like it?" etc., is better than rushing at a vote. It's so heavily dependant on finding a solid technical way of implementing it, there would be no point asking for support yet. It may simply be too shaky to roll out. I think on a small scale it could be done, and probably with few to no errors.Lets say we have a small private site, developed by a small team of computer geeks. They know exactly how to write the pages, so that the templates work, the scripting is stable and the sources, reliable. But here, we have an untrained army of editors, even the templates can be edited (sure they can be protected but lets not get ahead of ourselves), sources come and go, editors can change the sources in the templates, so rendering the results (if any) useless. The potential technical difficulties are enormous. the benefits are too, but you see why trying for support at this stage is pointless? I am very doubtful, that this would be implemented within a few years, even if the support was unanimous and vast, and the technicalities were fine tuned. It's going to be a long slow process, if it takes off at all. fredgandt 23:23, 5 January 2012 (UTC)[reply]
Or Wolfram, or a Wolfram-wannabe will grab the content from Wikipedia, set up a competing site and the game will open up. The web moves fast. And technically speaking it is totally feasible. Or Microsoft may see it as the "last hope for Bing" in staying relevant now that they have played with Wolfram's ideas. So under that scenario, Bing would grab the highest quality data from Wikipedia, and present it with a Wolfram-type engine.... It is a free economy in the end. History2007 (talk) 10:34, 6 January 2012 (UTC)[reply]

this is exactly the sort of edit that could be simply avoided. The information in the article would always be correct. Fewer DB calls made for what could be viewed as silly little edits. The edits clearly need to made, or the article data would be wrong, but if the most up to date data was presented (from the reliable source that backs it up), every time the page was loaded... So much better. fredgandt 02:32, 7 January 2012 (UTC)[reply]

But how do you get the up to date info? By screen scraping? Those approaches work for a while but eventually become impossible to manage if too many sites are involved because the screen formats change. If only a few sites are used, then that may be possible at first, but you should know that those sites will then notice a large number of "freebee calls" from Wikipedia that costs them money, and in an hour can change their format and end the game. History2007 (talk) 16:53, 7 January 2012 (UTC)[reply]
Screen scraping is usually the only way, very few sites provide an API. Some major ones do, but it's really more hassle to use that unless they offer batch requests, which most don't AFAIK. The request volume shouldn't be a problem if done sensibly, besides these sites would be well-visited reliable sources, and as such are more then likely to be capable of handling high traffic. The real problem with screen scraping is obviously that every site gets changes every few years. What this bot would need is an on-wiki page that keep track of all the syntax it uses, so that editors can update it manually. —  HELLKNOWZ  ▎TALK 14:26, 8 January 2012 (UTC)[reply]
A few users can manage their own screen scrapers, but in the long term uncontrolled screen scraping will result in chaos. I have seen that happen even internally at Fortune 100 companies. Data shows up wrong and panic buttons get pressed. If screen scraping populate a Wolfram-type system that can be checked for consistency that will make more sense. History2007 (talk) 17:07, 9 January 2012 (UTC)[reply]
A related idea is increased use of template/data subpages such that articles reference a single point for their data. That takes away the inconsistency, but doesn't take away the need to keep certain data up to date. Rjwilmsi 18:13, 8 January 2012 (UTC)[reply]
I use that at {{Scoutstat BSA}}, but it only needs updating once a year for most numbers. ---— Gadget850 (Ed) talk 19:06, 8 January 2012 (UTC)[reply]
{{Extrasolar planet counts}} is frequently updated but apparently by hand. Perhaps a subpage of Wikipedia:Bot requests with a standardized request procedure is a more realistic solution currently. PrimeHunter (talk) 19:43, 8 January 2012 (UTC)[reply]
See meta:WikiData_WMDE, currently being funded by the German Wikimedia association if I'm not mistaken. It's one of many wikidata attempts that have been made since Erik Möllers original 2004 proposal. —TheDJ (talkcontribs) 20:29, 8 January 2012 (UTC)[reply]
I looked at the WMDE document. I have "no idea" what they are really trying to do. I could read 20 different scopes to that project. The document needs a few clear examples. It could be interpreted as a simple "super scraper" or (given that they use the term "knowledge base about the world") as a variant of Cyc (which I hope it is not, for it will not fly). This sounds like R&D, not just an implementation. History2007 (talk) 17:02, 9 January 2012 (UTC)[reply]
As far as I understand it, the idea is a cross between: DBpedia (service provided to end user), Interlanguage links (method to include and interlink data in Wikipedia) and Semantic MediaWiki (Storage and querying of data). And no, it won't be easy, probably part of the reason the idea has been floating around since 2004. :D At what state the German initiative currently is, I have no idea, you can probably ask them, in my experience WMDE is quite responsive. —TheDJ (talkcontribs) 22:23, 9 January 2012 (UTC)[reply]
Ok, thanks, now I have a rough idea. It will be another one of those interdisciplinary, "we will merge multiple concepts together" type "fusion projects". I will pass on interacting with them - I have my doubts that they will get anything working within that type of scope any time soon. I guess they could at least start by cleaning up articles like this which are within their scope and need help. When Cyc was first proposed as the solution to the world's semantic problems, there was a joke in the valley: "what has just been declared as the news international standard for exaggeration?" Answer: a micro-lenat. So those systems have generally been "exaggeration driven". I have seen too many and know the inherent problems - there are fundamental barriers therein. The key question: How much money will they throw at it before they understand the complexity of the barriers? Time will tell. History2007 (talk) 23:26, 9 January 2012 (UTC)[reply]
Well we already have all the other things (perhaps not always under a WM Foundation umbrella). You see, the problem with Wikipedia has never been that you could not 'bolt stuff on top of it'. That's easy. The problem is making stuff perform on en.wikipedia, communicate with all parts of the software and integrating it in such a way that it becomes a future proof implementation that doesn't break when the bot author drops of the face of the planet. Almost by definition the bolting stuff on top will be faster, but you can also argue that bolting an phone on top of an iPod on top of an Internet communicator, although in theory as useful as an iPhone will never be any sort of success. http://www.youtube.com/watch?v=x7qPAY9JqE4TheDJ (talkcontribs) 08:47, 10 January 2012 (UTC)[reply]
Sorry, I do not see it that way. The analogies from hardware to ambitious software do not often apply. The phone technologies, etc. have no semantic content as such. History2007 (talk) 16:32, 10 January 2012 (UTC)[reply]
Well putting AI and semantic databases aside for a minute (or a few years), the idea that a bot could be helpful is practicable. The thing is, a bot is just a bunch of scripts called to run. If we could plug the bot directly into the pages so it was called to run on each page load, we have a winner. It is fair to assume that few if any of the sources cited will use either an API or semantics, so we have to therefore assume that the scripting would need to be very robust, able to tackle whatever was thrown at it and return either something useful, or a message to someone that it has problems (best place would be the article talk page). fredgandt 00:35, 10 January 2012 (UTC)[reply]
A bot to somehow run on each page load is not going to happen. Never mind how closely it would have to be integrated into MediaWiki or how much it would slow down each page load, just consider the "Slashdot effect" that would be produced.
If the problem you're trying to solve is something along the lines of the "Tokyo Narita airport and Paris de Gaulle Airport vs World's busiest airports by passenger traffic" problem mentioned above, a solution is to store the figures in a template and transclude that template where needed.
If the problem you're trying to solve is updating information from an external service, and the external service both provides the information in a manner that is amenable to automated access and the service allows automated access, it would be easy enough for a bot to run periodically (daily, weekly, monthly) to update it; with some work, it might even be possible for the bot to take configuration settings on-wiki so the bot author need not edit the source to add new sources. As mentioned above, though, it would work best if the data were available in a structured format so screen-scraping is not necessary.
HTH. Anomie 00:59, 10 January 2012 (UTC)[reply]
I think Anomie's analysis is both sensible and practical. The challenge is some method of "super transcluding" from what may be called a "repository template" for the lack of a better term. The long and short of it is that:
  • A repository is needed, be it a template, a XML-based-gadget, a new type of item, or heaven forbid a fast access (most probably non-relational) database - the best technical solution.
  • The repository gets updates once a week, once a month, etc. E.g. consider the list of the companies that make up the Dow Jones Industrial Average. That can be looked up pretty easily daily at a not-so-busy- time of nigh. It does not need to be hand coded. Same for number of passengers to Narita, etc.
  • Some piece of software accesses said repository, reformats, transcludes, presents (pick your word) the info and includes it in a page. The key is giving this software good enough formatting capabilities.
  • With suitable logic the repository will also include the List of rivers of Europe, and the list of rivers in Germany can be obtained from that so they are always consistent.
I think I just described a simple version of mini-Wolfram. History2007 (talk) 16:32, 10 January 2012 (UTC)[reply]
Example: At the moment the article "Google Chrome" is using {{LSR}} in the infobox, but the data is not referenced (so it seems). The best I could find as a potential reference is this linked to from Google's support pages such as it is here. If the source is considered reliable, and could be included in {{LSR}} or some other linked way, the bot running in the background could check the data in that source, and update the article or template whenever it was found to be out of date. Super.
Since we cannot control or govern the way sources present their data, but we rely on that data for our sources, we would have to employ a method less efficient than semantics etc. Thus a bot to do this would probably be the only practical method available at present. Although templates containing the data could indeed be transcluded on article pages, we would still have the issue that that data would need to be constantly and manually checked and updated. If the sources are truly reliable, I see no good reason to not have a bot do the grunt work.
Although a bot seems to be the best immediate solution, the idea of a fully semantic web with websocketed AIs running about fixing everything is of course preferable. It is also not likely to happen on a large, reliable scale any time soon. With that in mind, is there any more to discuss? Or should this simply be shut and a bot request opened in its stead? fredgandt 01:57, 10 January 2012 (UTC)[reply]
We already have a few bots that do menial content updating tasks. This should happen on the Wikimedia Labs or Toolserver as bots; ask at WP:BOTREQ to get started with some task you think is doable now, and as the bot experts implement it, you can follow along and take over so that you can add more capabilities as time goes on. Selery (talk) 16:04, 13 January 2012 (UTC)[reply]
That can be attempted but would be an ad hoc approach much like most Wiki-bots. In terms of long term planning, a good design "usually" wins over ad hoc solutions, and stop-gap measures. But then, as they say, the IBM 370 was a stop-gap measure that lasted three decades, so maybe one of those stop-gaps will lead to an overall design. History2007 (talk) 16:51, 13 January 2012 (UTC)[reply]
Although this was a good chat, I think the (queue sarcasm) torrent of comments and the flood of ideas is just too much to handle, so I think it's fair to say "nobody is that interested". And if anyone is, it looks as if it will be a job for a bot. fredgandt 04:41, 24 January 2012 (UTC)[reply]

Suggestion: MediaWiki software tool to flag revisions for oversight/revdel

(On an off-topic note, this is my first Village Pump edit.) Three-digit Interstates (3dis) have rules on what they connect to. I think articles about 3dis should have sections on whether or not they follow these rules. Here's a guide on Kurumi.com written by Scott "Kurumi" Oglesby (another roadgeek) about 3di numbering rules.

Multi Trixes! (Talk - Me on Wikia) 21:09, 14 January 2012 (UTC)[reply]

This sounds like something that should be discussed on WP:HWY. ♫ Melodia Chaconne ♫ (talk) 23:01, 14 January 2012 (UTC)[reply]
Okay. Moved!
Multi Trixes! (Talk - Me on Wikia) 19:38, 25 January 2012 (UTC)[reply]
Actually, this should be discussed at WT:USRD because it's US-specific. Imzadi 1979  19:48, 25 January 2012 (UTC)[reply]

A System for DRY Technical Definitions

I am new to editing (but old to Wikipedia). I started editing some mathematics articles and have a strong desire for a systematic way to handle definitions. Definitions of technical terms pose several problems:

  1. People use terms inconsistently. This happens in the literature generally as well as on Wikipedia. But it creates internal inconsistency when articles link to each other and seem to contradict each other because they are using slightly or significantly different definitions.
  2. Defining terms well takes a lot of careful work, attention to detail, and unraveling of varied usage.
  3. Terms being defined anew in different articles is wasteful repetition, which creates tension with an article's being self-contained.
  4. Technical terms expose a large knowledge gap between a newcomer and an expert, which is difficult to write for effectively.

It seems that a system similar to that used for free links or templates could help. How a system of definitions might work:

  1. The free links language could be extended or a new template could be added (not familiar with how these work yet) to include the automatic insertion of a definition from a (perhaps automatically) specified database.
  2. The tagged term could be visually or aurally marked as such and have a collapsible definition attached to it. The default behavior (collapsed/expanded) could be set by the editor and perhaps overridden globally by the user. The markup might look something like ... is a {{define|normal subgroup}} generated by... and be rendered as ...is a normal subgroup, a subgroup that is invariant under conjugation, generated by....
  3. The database of definitions could be curated by editors associated with the associated WikiProject (e.g., WikiProject Mathematics) or such. The definitions for a field could be managed and updated more easily in one place.
  4. Free links are already used in lieu of definitions to some extent but not in the best way possible. The idea is to improve the handling of technical definitions both within and across articles, making them easier to edit, more complete and globally coherent when needed, and invisible otherwise.

Possible problems:

  1. These wouldn't take the place of the main or more detailed definitions in an article. Finding the right length or level of detail might be tricky.
  2. They might not read well when in place. They might be less coherent than definitions edited as part of a single article. Coordinating nonstandard symbols (ad-hoc variables, constants, etc.) might be especially difficult. (This might be handled by having parameters to definitions, which maybe sounds complicated, but I can see a simple system working as a nice compromise.)
  3. If there are multiple options for definitions, an editor might want to select one over the others. This could get complicated. (Options needn't be necessary since an editor can simply write their own if they find the supplied one unsatisfactory.)
  4. People might be tempted to string articles together with just the definition links, which could hurt the overall quality of articles.
  5. Reusing content of course means that one edit can break lots of things. :^\

I'm not sure that these are insurmountable problems, but I am trying to get the ball rolling.

Cheers, Honestrosewater (talk) 09:32, 17 January 2012 (UTC)[reply]

Are aware of Wikipedia:Make technical articles understandable? I think the corresponding talkpage to that guideline is the best way to discuss this. Yoenit (talk) 10:19, 19 January 2012 (UTC)[reply]
I have read it but didn't think of that. Good idea. Thanks, Honestrosewater (talk) 15:29, 19 January 2012 (UTC)[reply]
A more active talk page to consider would be Wikipedia talk:WikiProject Mathematics. I understand that you're interested in technical articles in general, but you have to start somewhere. Also, the guys over there will probably "get" more what you're trying to do. Good luck! Leonxlin (talk) 03:43, 23 January 2012 (UTC)[reply]

DRYing Out History sections

I've recently been using Wikipedia to gain a grasp of how a computer functions: from ALUs all the way through the Internet. In doing so I realized that there is a lot of information overlap in the History sections in the many articles I was reading.

I was wondering if there has ever been talk of writing a 'background' section for articles, instead of a 'history' section. The background section would cover the topics that need to be understood before the full meaning of the current article could be absorbed. Slightly more in-depth than the summary. Also acts as a way of organizing all the articles into some sort of hierarchy.

As a complete noob to Wikipedia posting/editing, I thought this idea lab would be a good place to start, because I'm not even sure what to search for in the proposals to find similar ideas.

Looking for contribution at any level. thanks

Matthew.bowles.CO (talk) 08:28, 19 January 2012 (UTC)[reply]

I'm not sure exactly what you're getting at. Could you provide some examples? I'm sure there are articles that have background sections. I can't seem to find any right now, but the Pi article has a "Fundamentals" section. If you feel that an article ought to have a background section, feel free to add one (I'm not aware of any policy against them, but if I'm wrong, do correct me), preferably with the relevant hatnote. The Arithmetic_logic_unit page doesn't even have a background or history section. Perhaps you're concerned that the "Numerical systems" section on that page is repeating too much of what already appears on other articles? If so, what is your idea to fix that? Leonxlin (talk) 03:57, 23 January 2012 (UTC)[reply]

Ensuring that the SOPA/PIPA actions do not lead to a slippery slope

Before anyone gets the wrong end of the stick here, I supported the SOPA/PIPA blackout. A quote from me was even used by the administrators that made the formal decision to take action. But I think it is important that we formalise the common consensus that we only even consider such action in response to direct threats to our existence, which I and evidently others believed SOPA and PIPA did.

Of course, there would be little practical value to it, because we all know that the Wikipedia community will never agree to action for any reason other than a threat to Wikipedia. But I feel we need to do this in part as a result of the characterisation of Jimbo's comments in recent days. Whether he was taken out of context or said words to the effect of Wikipedia's content is neutral, Wikipedia itself doesn't need to be is a moot point.

The perception of some contributors (a few of who may or may not have retired), and among others some in the media, political and business spheres, is that Wikipedia is now non-neutral and politically active. Whether there is truth in that or not, I think we as a community need to make clear that the threshold for action was and remains consensus that the project's existence faces an imminent, credible threat. If that flies in the face of neutrality, so be it, but if so we should be open about the exception, and clear that it is that far and no further. —WFC11:48, 19 January 2012 (UTC)[reply]

WP:SOPA ---— Gadget850 (Ed) talk 14:45, 20 January 2012 (UTC)[reply]
You might also want to look at and perhaps take part in Wikipedia:Village pump (policy)#Future "consensus" access blackouts. :) --Maggie Dennis (WMF) (talk) 16:20, 20 January 2012 (UTC)[reply]

Free full text book content

http://forum.randirhodes.com/viewtopic.php?f=14&t=16878&p=126756#p126756

The thread posted is a conversation I have been having on the Randi Rhodes Message Board. Over the past year I have wanted to get more involved and I believe now the time is ripe for my idea.

When I look at the cost of books I see a huge cost-prohibitive barrier to the spread of intelligence. This Christmas came a special opportunity when millions of people got Amazon's $79 kindle from Santa. The cost-prohibitive nature of intelligence is rapidly decreasing, and wiki is leading the charge. Our goals are the same. However when I look at e-books and print books they are similarly priced.

Please, oh wise fellow wiki'ers correct me where I am mistaken. It is my understanding that many old classics and old books in general do not have a maintained copyright. For these works, with adequate volunteer time, they could be internet published.

Imagine a wiki where punching in "catcher in the rye" and an encylopedia entry pops up. Directly followed (and linked to by the encyclopedia entry) is a full text version of catcher in the rye readable on e-readers.

I'm bookmarking the page and I'd love to have a brainstorm session with people who may have already been thinking about these things as I have.

Vinny — Preceding unsigned comment added by 74.77.254.220 (talk) 05:23, 20 January 2012 (UTC)[reply]

To obtain texts of books which are in the public domain, you can use Project Gutenberg, which has thousands of well-known, published works which are now out of copyright. Many of our articles on older books already have external links to the full texts; see, for example, Northanger Abbey. As for Catcher in the Rye, I suspect it may still be protected by copyright. Unfortunately, I don't think Wikipedia could link directly from a novel's article to websites where you can buy the ebook, but if you click on the ISBN in any article, it will take you to the Book sources page where you can find a list of places to buy the book and libraries that have it available for borrowing. --Kateshortforbob talk 11:03, 20 January 2012 (UTC)[reply]
Wikisource is doing precisely this, but they don't seem to have any proper tools for downloading stuff directly to ereaders. Yoenit (talk) 14:12, 20 January 2012 (UTC)[reply]
What does "downloading stuff directly to ereaders" mean? When I download a file, nobody knows whether it's being saved on my hard drive or my ereader. What kind of "proper tools" are missing? Ntsimp (talk) 15:29, 20 January 2012 (UTC)[reply]
take Northanger_Abbey as an example. If I want to download this book to my ereader I need to use book creator and tag each chapter individually. A direct link to get the entire book would be a big improvement. Support for common ereader formats (such as Epub and Mobi) would be nice as well. Yoenit (talk) 19:23, 22 January 2012 (UTC)[reply]

sort articles by " NUMBER OF WORDS"

I would like to give a suggestion to your website. Could we sort the articles by " NUMBER OF WORDS"? For example, by 100 words, 300 words, 1000 words, 1500 words, etc. Most of the time for some user, they just want to know the general information or the subject of an article only. There isn't necessary to read the whole article to get the little information. Sorting by number of words is classified articles into different categories, for lesser words - e.g. 100 words of an article which is talking about Taoism, so readers may know what they need are just some main / key ideas ( without redundant history backgrounds). for more words, it could include more evident or findings for the subject. For even more words, it could include origins, history etc. So , all in all, just sort by different ways of summarization of knowledge. What do u guys think? I also posted this on the village pump (proposals), i'm first time to give a proposal, so a bit confuse where should it be put. Anyway, welcome your opinion! — Preceding unsigned comment added by 42.2.43.151 (talk) 01:34, 22 January 2012 (UTC)[reply]

Since the lead of the article (that's the stuff before the table of contents) is already supposed to be a summary of the article, if you just want a broad overview, the best thing is probably just to look at only the lead. Now, we don't always do a good job with the lead, but we try. Does that seem like it would solve your concern? Qwyrxian (talk) 02:28, 25 January 2012 (UTC)[reply]
Yea, the lead seems almost solved my problem, thanks. "For example, English law merely mentions murder; homicide in English law would provide (once complete) a few hundred words; Murder in English law a whole article, but whose lead might be about the summary in the previously named article. Grandiose (me, talk, contribs) 13:54, 20 January 2012 (UTC)" The example showed another thing need to be added into sorting, not only numbers of words should be sort, it should also include key words searching option next to / under numbers of words searching option. when we type "murder english law" in wikipedia, different length of complete articles comes out, just like what Grandiose posted. Is such situation rarely happened in wikipedia? — Preceding unsigned comment added by 42.2.43.151 (talk) [reply]

Create a category for CORRELATIVE CONCEPTS

To my limited experience and knowledge, I found there are quite a lot of correlated concepts among different stream of knowledge. Like “ nothing comes from nothing” is correlated to some ideas from “Modern physics”. Another example is - matter is merely a vacuum fluctuation (seems not updated in wikipedia yet, assume it is part of the content under Matter), it correlated with the Buddha’s concept – form is void, void is form (it means anything with a physical state is void). Add a “correlative concepts”(hyper link) next to the paragraph or the name of those concepts with the correlated idea.(now we have similar things like "see also") What’s more to do is to provide a category to collect all the "correlative concepts" (or the“see also”) and arrange them according to the alphabet headings of the titles of the correlative articles/paragraphs/sentences/key words. I think there is still room to improve the arrangement / design of the category part, would like to hear all sort of opinion from you! — Preceding unsigned comment added by 42.2.43.151 (talk) 01:37, 22 January 2012 (UTC)[reply]

About Wikipedia/What's Wikipedia? portal creation

I have an idea for a new about portal,because Wikipedia will reach soonly 4 million articles.

What's this all about?

It's all about Wikipedia (About the Wikipedia policy,Non-Encyclopedic abouts and normal articles about the Wikipedia's content,like Knowledge,Encyclopedia,freedom,community and etc...).

How it will look?

Like a normal portal,but the Featured article section will be divided into 2 sections:Featured policy (Wikkpedia policies) and Featured content (Encyclopedic arcticles).The "Featured picture of month" section will be replaced with the "Featured Wikimedia logo of the month" (Showing a Wikimedia project logo,like the Wikipedia,Wiktionary or Wikisource logo,but in big size).

Others

This portal will replace the Community portal (It will still feature news,events and other things about the Wikipedia/Wikimedia community),and the Community portal will turn into a Redirect,that will automatically redirect to the new portal.

Please make it!

try suggesting this at Wikipedia talk:Community portal. Yoenit (talk) 19:27, 22 January 2012 (UTC)[reply]

Goodbye Beaver Lake

May I suggest you add "Goodbye Beaver Lake," a novel, to the category "Novels, Quebec Separatism."

For information on Goodbye Beaver Lake, please go to:

www.Goodbyebeaverlake.com — Preceding unsigned comment added by 68.56.206.20 (talk) 20:08, 22 January 2012 (UTC)[reply]

Only articles can be added to categories, so you would have to write one about the book first. Be sure to check whether it meets the inclusion guidelines for novels first. If it does, use the Wikipedia:Article wizard to create an article. Yoenit (talk) 23:36, 22 January 2012 (UTC)[reply]

Cross checking categories

I find myself often wanting to check for things which fall into multiple categories, for instance I recently wanted to look for RPG's on the Sega Master System; the List of Sega Master System games article doesn't show genres however. What I would like to be able to do is check for games which are in both Category:Sega Master System games and Category:Role-playing video games at once. Or let's say I was looking for painters born in 1897 in Britain, then I'd like to be able to bring up a list somehow of people who are simultaneously in the Painters category, the people born in 1897 category, and people born in Britain category.

What I'd like to now is: is there currently a way to do this?(a feature or gadget that I'm unaware of), would anybody else be interested in a feature like this? what would be required to implement something like this?AerobicFox (talk) 21:53, 22 January 2012 (UTC)[reply]

Wikipedia:CatScan is what you want. Yoenit (talk) 23:38, 22 January 2012 (UTC)[reply]
Thank you :) AerobicFox (talk) 00:32, 23 January 2012 (UTC)[reply]
Soapbox time: Time for me to get on my virtual soapbox and once again say that the entire category mentality needs to be rethought. As is, the categories are assigned "at will" by editors, just like content. That is done with total disregard for a rich base of research on ontology based systems. While content is subject to WP:V and WP:RS there seem to be no requirements for category assignment apart from an "it looks good to me" assessment by users. Some of the early success of Yahoo came from their ontology design, all carefully hand crafted with much effort. I have for long wished that Wikipedia would use some formal, carefully thought-out basis such as Wordnet's ontology, given that it was a serious Princeton project, and not a random design. As is, the Wikipedia category structure is the wild, wild west of scholarship... I do not expect it to change soon, but given enough soapboxing, it may eventually edge that way. History2007 (talk) 17:26, 23 January 2012 (UTC)[reply]

The next level of online education

Some of you may know that professor Sebastian Thrun recently left his job at Stanford to create a new, free virtual "university": Udacity. If you don't know what it is, it's a place where they offer courses (so far only two) available to anyone who has internet access (read their about page here). These are real courses that start at a specific date and have tests and quizzes and completion certificates.

While some people might view this as an interesting idea, I think if done right, it could be world changing, just like Wikipedia. I figured if any place would like this idea, it would be here, a website dedicated to free knowledge.

I want to suggest that Wikipedia get behind this project. Maybe we could even get one of those cool banner ads at the top for it (complete with some Wikipedia founder staring into your soul). This could end up being some professor's side project, or it could end up being the next level of public knowledge, changing the world for the better just like Wikipedia has done. And I think it needs all the help it can to get to the latter. G man yo (talk) 04:17, 24 January 2012 (UTC)[reply]

There is already wikiversity: (not that I have really explored it much), the Open University (in the UK at least), and Second Life plays host to many Universities from around the world. In other words, online education (including virtual) is an ongoing and pre-existing phenomenon. Lets keep our excitement in perspective. fredgandt 04:34, 24 January 2012 (UTC)[reply]
I realize that online education has already existed for years, but Udacity is, in my opinion, nothing like any of these (except maybe Second Life, actually). The Open University is somewhat misleading, mainly because it isn't actually "open": it's just an online university. It costs money, so I wouldn't even consider it as comparable to Udacity. Wikipedia's university is more about teaching resources, I think. They don't really have classrooms or anything. Second Life is interesting, but the problem is that, since it's mainly a video game, is not particularly accessible and will always be viewed as a game above a classroom. Also, you can't actually see the people giving lectures really.
I'd say that out of these Udacity is the most promising. It has real classes taught by real professors with real assignments and it's completely free. And I don't think that the lessons are limited to watching a teacher in a classroom. Here is an interesting article about it.
I think we as the Wikipedia community should, at the very least, watch this closely. I thought the same thing as you did when I saw the site, that somebody has probably done this before. I was actually surprised that nobody really had. I think this is the first free online education program to really get it right.
As a side note, Udacity is a terrible name, in my opinion.G man yo (talk) 05:42, 24 January 2012 (UTC)[reply]
Second Life is not a game! Grrr. fredgandt
How would you suggest Wikipedia gets behind this project and does the project even want us? If any affiliation is being suggested, I reckon the suggestion would be better aimed at the Wikimedia Foundation. First off should probably be the creation of an article about Udacity (agreed, that name sucks), but then if it is too new, it might not be notable enough to warrant one. In that case, it might be better to wait and see if it takes off before getting heavily involved. fredgandt 06:25, 24 January 2012 (UTC)[reply]
Oh yeah, Wikimedia. Forgot about that one.
Yeah, at the moment, it might be too young (I always get excited about things and then act too quickly on it). Anyway, I'll keep an eye on it. I might try to make an article about it. G man yo (talk) 06:40, 24 January 2012 (UTC)[reply]
Are they going to pay for the banner ad as advertising, or do they expect free advertising? History2007 (talk) 21:55, 25 January 2012 (UTC)[reply]

Accounting Bodies pages

I have got a suggestion regarding the following pages:

ICAEW (Institute of Chartered Accountants of England and Whales)
ICAS (Institute_of_Chartered_Accountants_of_Scotland)
SAICA (South_African_Institute_of_Chartered_Accountants)
ICAP (Institute_of_Chartered_Accountants_of_Pakistan)
NZICA (New_Zealand_Institute_of_Chartered_Accountants)
ICAI (Institute_of_Chartered_Accountants_in_Ireland)
ICAI (Institute_of_Chartered_Accountants_of_India)
HKICPA (Hong_Kong_Institute_of_Certified_Public_Accountants)
CICA (Canadian_Institute_of_Chartered_Accountants)
ICAA (Institute_of_Chartered_Accountants_of_Australia)
and all the similar accounting body pages that are under this category.

I am a student of accountancy profession. I would like to suggest that all these pages should be categorized under same headings. I have seen all the pages, there is no symmetry in them. The facts of all these pages will definitely differ from each other but to make it a bit more presentable, I would suggest that all of them should have similar headings. If any page among them has some additional information with supporting references that the real author or anyone wants to present, it can be included after the headings which have been created earlier.
Inlandmamba (talk) 07:55, 26 January 2012 (UTC)[reply]

Good Article Nominee Reviewing Process

My idea is to have the good article nominees reviewed in order. I bring this up because some articles may go largely ignored by reviewers if the article is very dense or is a subject that reviewers might consider unappealing or boring. — Preceding unsigned comment added by DoctorK88 (talkcontribs) 03:07, 28 January 2012 (UTC)[reply]

This is a good idea. I am currently working on two good article reviews, and generally review according to my interest areas, but I find it rather disheartening to see articles lying around at the GAN page for months, in some cases. I have also noticed that some lesser-known topics, including obscure history-related articles of low importance but high quality, can remain unnoticed for long periods of time. Perhaps a group of editors could be found who would make their main goal in regard to GA to review nominations more than three or four weeks old. What do you think of this, and would you participate in a project like that? DCItalk 19:26, 28 January 2012 (UTC)[reply]
I oppose this. It's not that simple. I, for example, skip over any GANs that are on biographies, avoid the digital popular culture oriented sections (movies, music, television shows), most of the physical sciences sections, and GANs on subjects that are controversial (including anything touching on religion). I also avoid GANs started by certain users, either because I've worked with them too much recently, or because I don't feel comfortable working with them because of past interactions. Now that does leave me with a lot of things that I do cover, and I've done over a dozen roughly a month, but I'd wind up doing significantly less if I was forced to choose only from the oldest nominations. Sven Manguard Wha? 19:46, 28 January 2012 (UTC)[reply]
Your feelings are probably shared by many editors who review according to interests/personal choices (including me). However, wouldn't a group, however small, of editors who wouldn't be deterred by some of these things be useful? Members of this group could declare on the group page what they will probably review, and, based on this, they could be given nominations to look over. Obviously, this is a highly imperfect idea, but couldn't some parts of it be adapted to the GA process? DCItalk 20:06, 28 January 2012 (UTC)[reply]
That's not what the proposal was. I'd be more keen on the idea of generating a page where users could sign up for for specific sections, and when a GAN hit, say, the month, two month, three month mark, etc, a bot will send x persons on that list (where x is the number of months old the nomination is) a message poking them. I'm not, however, keen on the original proposal. Sven Manguard Wha? 22:14, 28 January 2012 (UTC)[reply]
Just putting out there, the process as it stands now seems inefficient to me since as mentioned before some articles just sit and sit and are not reviewed for a long time when all articles should be reviewed in a reasonable amount of time. Months and months (in my opinion) does not qualify as a reasonable amount of time. I hear your point about possibly doing less if articles come up that you're not interested in or if they are done by authors with whom you have worked with but at the same time those oldest nominations do need to be taken care of. You always have the choice of not reviewing the "next" article because it's controversial or for whatever reason. So, it's not like your choices are gone here. I am still relatively unfamiliar with the ins and outs of wikipedia but I think that my suggestion could be modified into a better system for reviewing good article nominees. Also, keep in mind that you are one reviewer and that other reviewers may have very different policies but again I understand that you oppose this because it might impede your ability to review articles. Perhaps reviewers should express a category of article (e.g. natural sciences) that they would typically be interested in reviewing and they should work on those regardless of which one comes next. Idk. Play with the idea, like I said I think the current system for good article nominee review is lacking. DoctorK88 (talk) 22:56, 28 January 2012 (UTC)[reply]
We also have the report, in case you didn't know. Sven Manguard Wha? 04:16, 29 January 2012 (UTC)[reply]
I think topic-based sorting is probably better: I've done GA reviews in the past, but they take time and effort. Probably about an hour for each one to do them properly. I'm not interested in wasting an hour reviewing an article on a pop culture topic: however much I love The X-Files, there is a limit to the amount of time I can spend reviewing articles about it. I think Sven Manguard's suggestion of having notifications by category is very useful. I get notified about RFCs on topics related to religion and philosophy, and I watchlist the relevant WP:DELSORT page on philosophy, it'd be nice to be notified about GA reviews on philosophy, and maybe PR/FACs too. —Tom Morris (talk) 12:46, 29 January 2012 (UTC)[reply]

A new user right/group

I want to develop the idea for a user right that would allow a trusted user to manually mark their own created pages as patrolled. This is less drastic than autopatrolled which marks all pages automatically but reasonable to presume a trusted editor could manually mark some. For me, I created this page and think I reasonably should have been able to reduce the workload by marking this page as patrolled. My76Strat (talk) 19:23, 28 January 2012 (UTC)[reply]

I don't think that the community is going to go for a 1/2 autopatrolled right. That being said, it might go for reducing the number from 50 to 25 or 30. Barring that, if my memory serves the patrolling interface is being redesigned. Perhaps some sort of greenlist could be built into that, where admins could add users to a list and articles started by those users would be highlighted in a different color? That might speed things up. Sven Manguard Wha? 22:49, 28 January 2012 (UTC)[reply]
I agree and want to avoid any hard sales approach. The spirit is that at some level a user should be trusted enough to designate a page as patrolled by some method of self-patrolling in particular when a claim of unambiguous propriety is a core. I am not sure however that there is any problems with backlogs that this would significantly improve, but this one example seemed to add a page that a good faith editor could have precluded. My76Strat (talk) 23:00, 28 January 2012 (UTC)[reply]
I really am not sure what the point of this suggestion is. We already have Autopatrolled which does exactly what you seek but without the intervening click on 'mark as patrolled'. —Tom Morris (talk) 11:42, 29 January 2012 (UTC)[reply]
I think autopatrolled is above my general qualifications. For respect of editors who are qualified, I wouldn't ask. But a lessor right that would allow an editor to selfpatrol talkpages, and perhaps redirects, could reduce an important workload. My76Strat (talk) 12:11, 29 January 2012 (UTC)[reply]
I'm not sure that this will work, to be honest. The designation of user rights is generally based on trust and competence - you need a certain amount of trust and to demonstrate a certain amount of competence to be granted the auto-patrolled userright. I believe that a half-autopatrolled userright will need at least as much trust & competence as the current autopatrolled right. We would need to be sure that those we give the new right will be able to execute good judgement regarding new articles they create: anyone who is able to do this is probably deserving of the autopatrolled right. The only way something like this might work is if we alter the autopatrolled group to allow them to bypass autopatrolling certain new pages that they think need patrolling. However, I'm not sure that this is really what you're after, nor will it be of much use. ItsZippy (talkcontributions) 17:25, 29 January 2012 (UTC)[reply]

Redirect procedure for non-notable articles

At the moment, often when New Page Patrolling, if one comes across an article which is not notable, is can be tagged for speedy deletion, PRODed, or nominated at AfD. There are, however, a number of cases where deletion is not the best option. A recurring example is articles on songs which are themselves not notable but should instead be redirected to the artist's article (when one exists). The problem is that these non-notable articles are often created by inexperienced users who are not familiar with our policies on notability (and, when appropriate, on songs). With all other articles, a deletion discussion can be initiated, which will result in a more permanent solution. However, with redirects, such discussions are much harder to have (and attracting attention to them is very difficult), so the article creator can often just revert the redirect. This can lead to a variety of problems, including edit wars, the retention of non-notable articles, and the use of AfD to settle these disputes. I therefore think that some kind of process in which an editor can nominate an article for redirection, in a central place (similar to how AfDs work) would be beneficial. This is because it would allow wider participation in these issues and deliver a more decisive result.

I am proposing this here because I can foresee potential problems. Firstly, the procedure which is used would need to be rigid enough to deliver decisive decisions, but flexible enough to perhaps allow speedy redirects, or something similar. In additions (and perhaps more importantly), this would need to be done in a way which does not bite new users; the majority of the users who create these non-notable articles are inexperienced, but good-faith editors who want to improve the encylcopedia. I'm hoping to get feedback and suggestions here, before I put the proposal to the community. Thank you. ItsZippy (talkcontributions) 17:49, 29 January 2012 (UTC)[reply]

Welcome to Wikipedia: Now read WP:V and WP:RS

The current, standard welcome message says:

  • The five pillars of Wikipedia, How to edit a page, Help pages, Tutorial, How to write a great article, Manual of Style

Now, I can not remember how many times I have had to tell new users to read WP:V (or WP:Truth) as well as WP:RS and not use low quality self-published web sites, etc.

I think it would be a good idea to change that message to have WP:V and WP:RS in flashing neon colors the moment someone registers... Well maybe not flashing, but certainly in a prominent manner. History2007 (talk) 09:00, 30 January 2012 (UTC)[reply]

By the way, it would be good to add to that notice that self-publishers such as Lulu (company), AuthorHouse, Xulon Press and iUniverse are not usable in Wikipedia. There are a few more and many users are unaware of this, and one needs to explain it to them again and again. Just takes up time, and should be a message upfront. History2007 (talk) 18:48, 31 January 2012 (UTC)[reply]

Free, globally distributed backup of all Wikipedia on Charity Engine grid

Charity Engine is a volunteer computing grid with a storage feature, based on advanced multi-level coding, in testing now. It uses the BOINC software suite.

The grid will maintain a constantly distributed backup of Wikipedia (as well as other large datasets) on thousands of volunteered home PCs. As the data is constantly replicated and replaced as various PCs join and leave the grid, it is almost impossible to destroy.

Charity Engine usually charges for distributed storage and processing on its grid, but strongly supports the goals of the Wikimedia Foundation and will be providing the backup for free. — Preceding unsigned comment added by 87.112.26.249 (talk) 01:36, 1 February 2012 (UTC)[reply]

Biographical metadata

Originally posted at a template for discussion thread, but was off-topic there so posted here.

I'm making a plea here for help in improving the organisation of the listings of biographical articles. Several years ago now, I said it would be nice to be able to generate a single master database of all biographical articles on Wikipedia. That would help tremendously in updating both human name disambiguation pages (e.g. {{hndis}}) and human surname set index pages such as Fisher (surname) (see {{surname}}). For an example of the former, see the update I made here at Paul Fischer. I had been looking for information on that Paul Henri Fischer (without knowing his middle name) and though I knew his birth and death years and found his article that way, I had to add him to the human name disambiguation page myself. The point here is that I'm not aware of any systematic effort to keep such pages updated. It is not a trivial proposition (those with long memories will remember the massive lists of people by name that got deleted), but could be automated or semi-automated if the following was done:

  • (1) Identify all existing biographical articles (i.e. ones about a single person's life story) and tag them accordingly. This would involve separating out the 'biographical' articles tagged by WikiProject Biography that are in fact group biographies (such as articles about music groups, families, siblings, saint pairs, and so on). Those group biographies will still contain biographical metadata, but need to include a 'group biography' tag. Not sure how to handle cases where a person's name is a redirect (these are not common, but are not rare either).
  • (2) Ensure all such articles are accurately tagged with DEFAULTSORT or some other 'surname' parameter (with the usual caveats about needing to be aware of guidelines in this area and correctly identifying what is the 'surname', which is not always easy and varies around the world, and how to treat people with only one name, and so on).
  • (3) Generate the masterlist/database to list all biographical metadata, including all data present in the infobox, in the categories, in the DEFAULTSORT tag, and in the Persondata template. This is the point where the data can be compared and cleaned up if necessary. But for now, the data of interest is the name.
  • (4) Generate a similar database for set index and human name disambiguation pages such as Fisher (surname) and Paul Fisher (different spelling to the one above, which brings up a slight problem in that some alternative spellings are rightly bundled together on one page, and some are not - this may make machine-identification of the right set index pages harder, but not impossible). Also, some are of the form "name (disambiguaton)" or "surname (surname)" or "surname (name)", and that can change over time as people move pages around, but there should be a non-trivial way to address this.
  • (5) From the alphabetical listing of all the biographical articles, identify lists of those with the same name and ensure the corresponding surname set index pages and human disambiguation name pages (if they exist) are updated at regular intervals, possibly by bot talk page notification with a list provided by the bot. The bot could generate suggested lists using a combination of the article title (for linking purpose), and the Persondata name, birth year, death year, and short description fields. I think a project took place at one time to keep set index name pages updated, and that might have used bots to generate lists, but I can't remember where that project was, how successful it was, and if it is still going (update: I was thinking of this from 2008: "22,743 suggested surname disambiguation pages, created [...] from the May 24, 2008 database dump").
  • (6) Ideally, such a biographical listing of all biographical articles (now approaching 1 million) would be done dynamically by a category listing. But there is no single category for this as yet. The closest ones are the category for articles on living people (555,778 articles at present) and the listing of articles tagged by WikiProject Biography (which is a listing of the talk pages only). It is possible to generate partial set index names pages using the 'living people' category (e.g. surname Rabe (currently 14 people) can be compared with Rabe which only lists 12 people, of whom three are dead and one is a redirect), but this only puts those querying the category at the start of any dynamic 'list' of people by name and doesn't take into account biographies of historical (dead) people.

Would those reading this be able to say how feasible the above is, what work has already been done or is being done, and what would need to be done to get to the stage where we can be confident that our set index pages and human name disambiguation pages are accurate and updated at regular intervals to stay accurate? Or suggest which places I should go to to see who else might be interested in helping with this sort of thing? Carcharoth (talk) 23:52, 4 February 2012 (UTC)[reply]