Jump to content

Wikipedia talk:Persondata: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Metadatain biogrpahy infoboxes
Line 281: Line 281:


In case anyone missed it, see [[Wikipedia talk:Persondata#Half-automatic tagging with persondata-tool]] for details of a script to help extract and add persondata. [[User:Carcharoth|Carcharoth]] 01:10, 16 June 2007 (UTC)
In case anyone missed it, see [[Wikipedia talk:Persondata#Half-automatic tagging with persondata-tool]] for details of a script to help extract and add persondata. [[User:Carcharoth|Carcharoth]] 01:10, 16 June 2007 (UTC)

== Metadatain biogrpahy infoboxes ==

What's to be done about projects like [[Wikipedia:WikiProject Composers|WikiProject Composers]] and [[Wikipedia:WikiProject opera|WikiProject Opera]], where a cabal are insisting, against the evidence, to have a consensus for the removal of biographical infoboxes from all of "their" articles? [[User:Pigsonthewing|Andy Mabbett]]

Revision as of 08:32, 16 June 2007

Please add {{WikiProject banner shell}} to this page and add the quality rating to that template instead of this project banner. See WP:PIQA for details.
WikiProject iconBiography Project‑class
WikiProject iconThis page is within the scope of WikiProject Biography, a collaborative effort to create, develop and organize Wikipedia's articles about people. All interested editors are invited to join the project and contribute to the discussion. For instructions on how to use this banner, please refer to the documentation.
ProjectThis page does not require a rating on Wikipedia's content assessment scale.
Protected Template:Persondata has been protected indefinitely. Use {{editprotected}} on this page to request an edit.

The {{Pharaoh Infobox}} template is including {{Persondata}} at the top of pharaoh articles... Mike Dillon 00:46, 13 January 2007 (UTC)[reply]

yes, we need to resolve this. I asked on the Pharaoh infobox talk page that the Persondata template be removed from the Infobox. We need to follow up on this. --Rajah 22:05, 1 April 2007 (UTC)[reply]

Template:Persondata edit request

Could a sysop please add the line <!-- Metadata: see [[Wikipedia:Persondata]] --> to the usage section on {{Persondata}}, right after the <pre> tag? This would make it in line with the example given in the Wikipedia:Persondata#Using the template section of this page, and would make it easier for those who don't know about this system to figure it out. Thanks. Picaroon 04:25, 14 January 2007 (UTC)[reply]

Done. Luna Santin 19:39, 15 January 2007 (UTC)[reply]
Thanks. Picaroon 20:40, 15 January 2007 (UTC)[reply]
Adding the comment to the template doesn't actually do anything as the comment is not viewable either in the article view or the editing view. Perhaps it would be useful to add an actual note into the template that is not an HTML comment. Kaldari 23:18, 24 January 2007 (UTC)[reply]
An HTML comment is the only way to handle it. Persondata is not visible; it's just a textual note within the window as to what it is. Ral315 (talk) 00:35, 25 January 2007 (UTC)[reply]
My point is the HTML comment is only useful if it is outside the template, rather than inside. If it's inside the template, you'll never see it since HTML comments in templates are not displayed in editing mode. Thus the recent edit to the template should be reverted. Kaldari 02:52, 25 January 2007 (UTC)[reply]
Compare it yourself: before after
It's useful for copy/paste. Editors who are unfamiliar with {{persondata}} know where to have a look at for more information (because HTML comments _are_ visible in edit mode). --32X 05:07, 25 January 2007 (UTC)[reply]
My mistake. I thought the usage notice had been added to the template itself. Kaldari 18:14, 25 January 2007 (UTC)[reply]

Hi Kaldari. Could you edit the template to link to Template:Persondata/doc, following the template doc page pattern? Mike Dillon 18:46, 25 January 2007 (UTC)[reply]

Where's the actual benefit of it? The doc page contains less information. --32X 22:22, 25 January 2007 (UTC)[reply]
I'm not sure what you're asking, but the benefit over the current situation is that the doc portion will be editable by anyone while keeping the template itself protected. Mike Dillon 22:53, 25 January 2007 (UTC)[reply]
Ok, that's an argument. But wouldn't it be better to set a redirect to Wikipedia:Persondata since that page is all about the template? (If one knows about the template, the short form for copy/pasting is enough; otherwise the introduction is a "must read".) --32X 23:41, 25 January 2007 (UTC)[reply]

Siblings and parents

Can we add siblings and parents as a cat? That way if the info is removed from the article, at least the info will be easily found by those who need the info. The info doesn't have to display, but its a good place to store it. The biography infobox has this information but it displays all answers. This way the info could be not displayed and still be available for researchers. Answer at my page please. --Richard Arthur Norton (1958- ) 20:45, 24 January 2007 (UTC)[reply]

This seems like a bad idea; persondata has been standardized for the most part. Ral315 (talk) 23:37, 25 January 2007 (UTC)[reply]

hCard microformat

It should be relatively trivial to arrange to have "Persondata" published with hCard microformat mark-up, simply by applying some standard class names to its containing elements. The data coudl then be extracted by a variety of parsing tools. Please see also Wikipedia:WikiProject Microformats Andy Mabbett 20:59, 28 January 2007 (UTC)[reply]

Anyone interested in a proposed project to link to WorldCat Identities is invited to leave comments or sign up at the project proposal page. WorldCat Identities provides pages for 20 million 'identities' (authors and persons who are the subjects of published titles in WorldCat). Several thousand of these pages provide links to Wikipedia biographical pages: providing links in the other direction would allow readers of Wikipedia biographical articles to move straight to associated library information held in WorldCat libraries. Dsp13 15:17, 20 February 2007 (UTC)[reply]

Template:Birth date and age

For the "Date of Birth" parameter, should we use {{birth date and age}} or should we stay clear of this? --WillMak050389 01:10, 5 March 2007 (UTC)[reply]

I would avoid it. Any application using Persondata is likely to be working with the wiki-text directly, which means it will see {{birth date and age|1967|07|15}} rather than July 15, 1967 (age 39). The idea with Persondata is to make it easier for automatic extraction of data; either of these is yet another format your parser has to handle. In any case the age is more useful to human readers; given the birthdate any program can easily calculate the age. Dr pda 01:39, 5 March 2007 (UTC)[reply]
Thanks, I wasn't sure, but this makes sense. I'll change the ones I've edited. --WillMak050389 01:43, 5 March 2007 (UTC)[reply]

Half-automatic tagging with persondata-tool

I come from the german Wikipedia. At January 24th 2007 126.332 from 133221 (94,8 %) persons are tagged with persondata. A very useful utility is the persondata-tagging-tool from Apper. It extracts automatically birthdate, birthplace etc. from the article and the only thing the user has to do is to check if it's correct and then save it. If someone of your project asks him, maybe he will help you with his tools so you can tag your articles much easier and faster. Bones 77.180.105.11 22:57, 15 March 2007 (UTC)[reply]

I'm actually almost finished writing a script to do a similar thing, although it requires the article to have an Infobox from which the data is then extracted, rather than getting the data from the lead of the article. However there are still around 50 000 articles using one of the top 20 or so people-infoboxes (e.g. {{Infobox Football biography}}, {{Infobox musical artist}}), which is about ten times the current number of articles with persondata.
It is more difficult to extract the information from the text of the article (i.e. without an infobox) compared to the de wiki, since on the en wiki the birth/death places are typically not given in a predictable place, i.e. the opening sentence. Compare the first sentences of de:Alfred Hitchcock and Alfred Hitchcock
  • Sir Alfred Joseph Hitchcock KBE (* 13. August 1899 in Leytonstone; † 29. April 1980 in Los Angeles) war ein Filmregisseur und Filmproduzent britischer Herkunft.
  • Sir Alfred Joseph Hitchcock, KBE (August 13, 1899 – April 29, 1980) was a highly influential film director and producer who pioneered many techniques in the suspense and thriller genres.
Hopefully I will have time this weekend to get the script finished. Dr pda 01:27, 16 March 2007 (UTC)[reply]
OK, I've finished the script now. Instructions for use are at User talk:Dr pda/persondata.js. It also includes a tidied-up version of the javascript above for turning persondata on/off without editing your monobook.css. Sample results of using the script are here.
This is a very nice tool - thanks! However, at present it seems to insert the persondata at the top of the article, rather than before categories. No, sorry, it puts everything in the right place! Dsp13 12:17, 19 March 2007 (UTC)[reply]
Or rather, it puts the persondata in almost (but not quite) the right place whenever there is a defaultsort template introducing the categories - see my query below. Dsp13 21:51, 19 March 2007 (UTC)[reply]
By the way I've also got the extraction from the XML dump more-or-less working by modifying the scripts linked at WP:PDATA#Extraction from the XML dump (the last step is deciding whether to write code to parse the dates which are currently giving errors, or just change the data in the article). I don't have an appropriate place to put the scripts on the web, but if anyone wants a copy email me. Dr pda 01:50, 19 March 2007 (UTC)[reply]
User:SEWilco has left this plea, which sounds reasonable, on my talk-page: 'Please do not have your script call itself "this script". That makes reading and searching edit summaries much more difficult.' Could a simple alteration to the script be made? Dsp13 09:14, 1 April 2007 (UTC)[reply]
I've changed the edit summary; it now reads adding persondata using User:Dr pda/persondata.js. I'm not entirely convinced the previous edit summary was difficult to read (compare 'reverted vandalism using popups', 'renaming category per CFD with AWB' etc); anyone interested in knowing which script would click the link, anyone not interested would just be able to see it was done with a script. As for causing difficulty in searching through edit summaries, there should only be one instance of it in an article's history. Users of the script will need to refresh their monobook.js to pick up the change. Dr pda 12:24, 1 April 2007 (UTC)[reply]
thanks! agree with you, but nice to keep everyone we can happy! Dsp13 13:06, 2 April 2007 (UTC)[reply]

Query re positioning of persondata before categories

Where categories are immediately preceded by a Template:DEFAULTSORT, should the persondata go between the defaultsort template (which seems the strict reading of 'immediately before categories', but confusingly splits the defaultsort template from the categories it is concerned with) or immediately before the defaultsort template (which seems more natural to me, but should be specified if that is what is to be recommended)? Dsp13 12:33, 19 March 2007 (UTC)[reply]

In my opinion {{DEFAULTSORT}} is not a real but a meta-template which directly belongs to categories. I don't see the problem here, but to avoid any confusion I've added a comment. --32X 21:19, 19 March 2007 (UTC)[reply]
Thanks. I've modified the script to place the persondata before the {{DEFAULTSORT}} template if it exists. You may need to refresh your monobook.js to pick up the changes. Dr pda 23:04, 19 March 2007 (UTC)[reply]

If you see someone removing persondata templates...

...you can now tell them not to do it again, by putting {{subst:pdataremove-warn}} on their user talk page. They will also be pointed here for more information on persondata. Resurgent insurgent 03:33, 25 March 2007 (UTC)[reply]

Why is persondata separate to infobox?

Further to my above comment about hCard, please can someone explain to me the purpose and advantage of having persondata in a separate, hidden-by-default table instead of having the same, standard fields in the output of the various infobox templates? What tools exist to parse persondata, inside or outside Wikipedia? Andy Mabbett 00:59, 26 March 2007 (UTC)[reply]

The {{persondata}} isn't a real information box but meta data. It was introduced for the first DVD of the German Wikipedia. The data field is pretty easy accessible with direct SQL (when you have downloaded an image) and therefor allows search operations. With a large article base (more than 100,000 in de.WP) it allows you to do SQL operations like f.e. to search for articles of birth places which aren't written yet. Some time ago I've read about several tools, but because I didn't felt the need I haven't used them. --32X 18:50, 28 March 2007 (UTC)[reply]
Thank you for the explanation. The use-case makes sense, but it seem to me that this could be achieved just as easily, by using hcard, and hCard-like classes, in infoboxes, instead of repeating the information separately; and that that would have additional advantages for readers and editors, through greater interoperability with other tools and websites and ease of authoring. It would also facilitate persondata-like metadata for organisations and venues, though their infoboxes. I'm happy to advise further, if anyone's interested in pursuing this possibility.Andy Mabbett 19:16, 28 March 2007 (UTC)[reply]
To clarify issues in my own mind, I've drawn up a comparison of persondata and hCard properties, on the microformats wiki. Andy Mabbett 19:49, 28 March 2007 (UTC)[reply]
A good reason is that someone using Pesondata usually has read this page and knows what they're doing. It is far more common for people to mess up and misuse infobox, which would garble the metadata.Circeus 19:03, 28 March 2007 (UTC)[reply]
Like any bad edit, surely that can be remedied? Andy Mabbett 19:16, 28 March 2007 (UTC)[reply]

The issue of persondata vs infoboxes has been raised several times on this talk page, see #Use inside implementations of other templates, #Not picked up by Google?, #Hidden Metadata, #Revisiting Infobox Person and #Why is this seperate from Infobox Person?. Some of the main arguments given against combining them are

  • This would require every biography to have an infobox, which many editors are opposed to.
  • There are a large number of different infoboxes (approx 160), not all of which have all the fields of persondata, and which currently vary greatly in the names for the fields they do have.
  • Persondata takes names in the format surname, firstname in order to be able to create an alphabetical list by surname.

There are examples at WP:PDATA#Extraction of persondata of how to extract persondata from an SQL database, or scripts to extract and parse it from the WP XML dump and insert it into a mySQL database, on which you can then run all kinds of queries (these scripts are written for the de wiki but I have more or less adapted them to the en wiki following the hints there, see my comments above).

I notice that your comparison of infobox/persondata/hCard at the microformat wiki is expressed in terms of the rendered (X)HTML of the page; both the previous methods for extracting persondata work with the raw wiki markup, i.e DATE OF BIRTH = 22 May 1977 rather than 22 May 1977. Using hCard would then seem to imply a lot of HTML-scraping to get the data, rather than using the periodic database dumps. (there are over 200,000 biographies, though admittedly only a quarter or so have infoboxes and only about 7000 currently have persondata.) Looking at the list of hCard implementations here it seems that most of these implementations deal with recognising hCards on an individual webpage/converting to vCards/adding to address books etc, rather than dealing with large collections of hCards (which would be the end goal of an equivalent to persondata), although I suppose some of the PHP tools could also be used to populate a database. I also notice that hCard does not yet support the date of death and place of birth/death fields, which would seem to argue against its immediate implementation in place of persondata. Perhaps the best way of combining persondata with hCard (if you want to go there at all) would be, as you originally suggested, adding extra class tags in the persondata template itself. Dr pda 15:10, 31 March 2007 (UTC)[reply]


Thank you for your detailed response. I appreciate that this must be old ground for some people, but I trust that you will agree that consideration of microformats makes it worth revisiting/ I'll address your points as bullets, for the sakes of convenience and clarity:

  • "This would require every biography to have an infobox, which many editors are opposed to" - I would question why they're opposed, and whether they're perhaps putting personal (aesthetic?) preferences before the convenience of users. That said, perhaps, one day, it might be possible for user preferences to include a "do not display infoboxes" option, like the current "do not show TOCs" option.
  • "There are a large number of different infoboxes (approx 160), not all of which have all the fields of persondata, and which currently vary greatly in the names for the fields they do have" - I think there's a case for some standardisation here; perhaps a root "persondata" template, to be included in other biographical infobox templates, in the same way that "coor" is included in a number of other location- related infoboxes.
  • "*Persondata takes names in the format surname, firstname" It's possible for software to convert for one format to the other; or for the data entry to be in to (or more) fields (there's experience of doing this for the name field in hCard).
  • It should be possible for XML to be dumped from infoboxes/ hCards if required.
  • it seems that most of these implementations deal with recognising hCards on an individual webpage" - most, but not all, and thee just the "early adoptions" there's - deliberately - plenty of scope for other use cases.
  • I also notice that hCard does not yet support the date of death and place of birth/death fields" - yes but the comparison page you cite suggests a work-around for that.
  • adding extra class tags in the persondata template itself" hCards (indeed, all microformats) are intended for data that is visible on the page; not for hidden metadata

Finally, being naturally lazy, I believe strongly in both not reinventing the wheel, and not doing work (i.e. entering data) twice.

Cheers, Andy Mabbett 19:33, 31 March 2007 (UTC)[reply]

P.S. Even while I was typing the above, The Anome was adidng, on the Microformats Project talk page:

This a bootstrapping effort at the moment, and you won't see any extra utility in the very short term: but once there's a substantial amount of semantically-tagged content on Wikipedia, some very interesting things will start to happen...

Andy Mabbett 19:39, 31 March 2007 (UTC)[reply]

Persondata box & succession box display

In the case of Victor Hugo, the displayed persondata box gets mixed up together with an immediately preceding succession box. Anyone know why, or how to fix it? Dsp13 12:12, 31 March 2007 (UTC)[reply]

There was a missing {{end box}} template after the succession box. It's fixed now. Dr pda 15:10, 31 March 2007 (UTC)[reply]

Gregorian/Julian calendar shift

How best to handle old-style dates? At the moment with Samuel Johnson I've left a template for old-style dates in his birth year, but (per discussion of dates above) I'd rather leave something more transparent in the wikitext. Dsp13 12:54, 31 March 2007 (UTC)[reply]

I think the way you have handled it is best for now. --Rajah 05:30, 2 May 2007 (UTC)[reply]

Transcluded persondata?

Ramesses II has persondata somehow 'transcluded' onto the page. I'm not quite sure how this works, or it it's desirable. Any thoughts? Dsp13 21:28, 1 April 2007 (UTC)[reply]

It is because the Pharaoh Infobox contains the persondata template. Wikipedia_talk:Persondata#Template:Pharaoh_Infobox --Rajah 23:25, 1 April 2007 (UTC)[reply]

Automatically adding Persondata from German Articles

Wouldn't it be possible (and infinitely faster) if we had a bot/script that could just translate the German persondatas into English? The German articles are already mapped to the English ones, the fields are the same (converting dates should be a breeze) and the only hard parts would be locations/descriptions/names. Does this sound like a good idea? --Rajah 06:42, 4 April 2007 (UTC)[reply]

Yes. No point in duplicating effort, and only the name and description fields really need translation, although putting it all together (following interwikis, extracting persondata, converting, and inserting) does sound kind of troublesome. --Gwern (contribs) 15:26 4 April 2007 (GMT)
Yeah, that's a great idea. Sounds like a challenging bot to write though. Kaldari 15:30, 4 April 2007 (UTC)[reply]
Sounds like a great idea! Should be mentioned on Wikipedia:Bot requests. MahangaTalk 22:58, 14 April 2007 (UTC)[reply]

Other metadata information

Is there any other metadata templates? Is there any project to make more relevant metadata templates or does the microformats project pretty much take up this area? Remember 16:13, 7 April 2007 (UTC)[reply]

No, sadly, an organized metadata does not yet exist on wikipedia as far as I know. The microformats, persondata, geodata, etc. movements are all balkanized at present. (Not that that is a bad thing necessarily.) I'm actually in favor of articles having a separate metadata page a la talk pages, but that's just my two cents. --Rajah 05:26, 2 May 2007 (UTC)[reply]
You may want to look at Extension:Semantic_MediaWiki depending on your level of interest. --Rajah 22:18, 9 May 2007 (UTC)[reply]

Use on pages listing multiple people...

I'm looking at Delirious?_musicians and wondering if PERSONDATA can appear multiple times on the same page and not cause problems. Any thoughts? Dan, the CowMan 03:17, 10 April 2007 (UTC)[reply]

For now, I would stick to adding persondata to people with articles about themselves. --Rajah 05:24, 2 May 2007 (UTC)[reply]

hCard microformats in infoboxes

Further to earlier discussions, a number of biography-related infoboxes now produce an hCard microformat. Please feel free to add the necessary mark-up to more. (Cheifly, that's class="vacrd" on the whole infobox and "class="fn" on the pagename or name field.) Note that the date of birth is only included if {{Birth date}} or {{Birth date and age}} is used. Andy Mabbett 17:14, 19 April 2007 (UTC)[reply]

NAME attributes

A couple of questions:

- Are nicknames acceptable within the ALTERNATIVE NAMES attribute?

- Could it be further clarified as to what should populate NAME and ALTERNATIVE NAMES? For example, for Tony Blair, his full birth name is in ALTERNATIVE NAMES, and his familiar name in NAME, but for Steven Gerrard it is the other way around.

Thanks, --Jameboy 16:16, 20 April 2007 (UTC)[reply]

The Tony Blair example is how it is supposed to work. For Steven Gerrard, his full name in the name field is enough. Having his name sans middle name in the Alternative field doesn't add any information. (If anything you would put Gerrard, Steven in the name field and Gerrard, Steven Middlename in the Alternative field. --Rajah 20:32, 20 April 2007 (UTC)[reply]
Thanks. What about nicknames though? --Jameboy 14:28, 22 April 2007 (UTC)[reply]
I would say it would depend on the nickname and how uniquely identifying it is. e.g. "Honest Abe" shouldn't be in Abraham Lincoln, but Splendid Splinter could be in Ted Williams. For nobility, nicknames are sometimes the first name in the persondata, e.g. Catherine II of Russia Generally, if a nickname universally and uniquely identifies someone, I think it should be listed in the Alternative Names section, if it fails to meet those criteria it should be omitted. Do you have a specific example? --Rajah 20:33, 28 April 2007 (UTC)[reply]

Colors

Is light gray really a good color to put on a white background? Maybe it could be darker and in bold. ~ EdBoy[c] 03:15, 12 May 2007 (UTC)[reply]

I guess most editors who use Persondata are familiar with the fields. Since the data is only a set of meta-data without any relevance for the article at all, it's not that bad to see it in a decent colour scheme. IIRC the colours are defined by CSS, so you should by able to define your own CSS rules (dark, bold, blinking, CAPS, ...). --32X 20:23, 15 May 2007 (UTC)[reply]

Hispanic Surnames

Could the instructions please be specific, that the surname generally used in Spanish language names is the first surname where two are given. For example, I have just become aware of this template because of one of the pages I have on my watchlist, Ecuadorian footballer Ulises de la Cruz. His mother's surname is Bernal, and so the full, formal version of his name is Ulises de la Cruz Bernal, but this is not in common use. He has,however, been given a persondata box showing him as

|NAME= Bernal, Ulises de la Cruz

There will be many errors of this type if this is not made very clear. I am unsure as to how scripts to automatically extract information might avoid this error. Kevin McE 16:47, 15 May 2007 (UTC)[reply]

Yes, the script is a guide. The human being who was using should have realized that Bernoid was the maternal surname and that it was not Ulises' surname. That's why the name of his article is Ulises de la Cruz, with no Bernoid. Generally, I think editors should either stick to names in languages with which they are somewhat conversant, or learn the rules for the language/culture they are editing, so that errors of this type don't propogate. --Rajah 01:48, 16 May 2007 (UTC)[reply]
Interestingly, the german wikipedia gives this names as: NAME=de la Cruz Bernardo, Ulises , while the Spanish wikipedia has to have his persondata added. --Rajah 01:50, 16 May 2007 (UTC)[reply]

German Wikipedia

How is telling people how many articles on the German Wikipedia have persondata useful information on the English Wikipedia? Voretus 17:00, 17 May 2007 (UTC)[reply]

One way it is potentially useful is that a skilled programmer could transfer the persondata wholesale in the same fashion as the Interwiki bot does with interwiki links. --Rajah 18:47, 17 May 2007 (UTC)[reply]
Motivation. Compare it with my answer for So What Does This Do Now. You can't work with only a few articles, you need a larger base. --> "So they've reached > 150k? Wow. We'll try to be better in a few months." (hopefully) --32X 23:18, 17 May 2007 (UTC)[reply]

My two cents on Wikipeda's handling of metadata

Why not put Metadata in it's own tab for any given article? Thus all articles would have an "Article" tab, a "Discussion" tab and a "Metadata" tab. This would keep the article area clean of meta data, the tab could be (hypothetically) be limited to more advanced users. I realize this might be a bit of a pain in terms of extending the MediaWiki software, but long haul would this not be a major improvement? —Wikijeff 16:26, 12 June 2007 (UTC)[reply]

I totally agree, as I mentioned earlier [1]. For now though, this is the best compromise we can make. Mostly the reason this issue isn't that dwelt upon is that only a very small minority appreciate it. I'm slowly working on some offline wikipedia data mining/visualization tools that should, hopefully, get people fired up about it. --Rajah 00:37, 13 June 2007 (UTC)[reply]
I also agree that this would be a great step, but it needs to be raised with the devs (who for all I know may already be working on something of this nature). Community support is only marginally relevant for MediaWiki software issues such as this. -- Visviva 02:27, 13 June 2007 (UTC)[reply]
Until there is a change in the software, it seems to me that the best option is to store metadata on a subpage (I think this has been mentioned before). I discuss this below for the Persondata template, but in fact, the current version of my demonstration allows for arbitrary metadata: the data needed for a particular purpose (such as Persondata) are selected using a key. Geometry guy 20:07, 15 June 2007 (UTC)[reply]

Persondata on a subpage

There has been some discussion here about why Persondata should be separate from the infobox, how birthdates and names are formatted, problems with entering the same information several times, and so on.

A lot of these issues would be easier to deal with if the Persondata were stored on a subpage of the article talk page. (It would make more sense to store it on a mainspace subpage, but these don't exist.) With a small modification to the Persondata template (see User:Geometry guy/Persondata) it is possible query the Persondata via straightforward transclusion of the subpage. I have made a "proof of concept" at Alexander Grothendieck and Talk:Alexander Grothendieck/Persondata.

Straightforward transclusion of the subpage produces the Persondata table. This may be a problem for search methods which query the wikisource of the article, but I would question whether the latter is the best way to query this data, especially if it involves downloading the entire article.

On the other hand, transclusion of the subpage with a key allows for easy extraction of the data. For example

{{:Talk:Alexander Grothendieck/Persondata|key=birthdate}}

produces

(1928-03-28)March 28, 1928Expression error: Unrecognized word "march".

with not an SQL query in sight. This can be used to transclude DEFAULTSORT and infobox information into the article, allowing these data to be combined with the Persondata without requiring editors to use infoboxes if they don't want to.

Furthermore, the data on the subpage could be richer than in the Persondata table. I have illustrated this by allowing both the sortname and the usual name to be transcluded. The latter is often the name of the article, so this may not be so useful, but it is not difficult to imagine other applications of the same idea. Indeed one can imagine the infobox template automatically transcluding almost all of the infobox information from this subpage, removing clutter from the wikisource of the article. Geometry guy 14:43, 15 June 2007 (UTC)[reply]

I've now also produced a {{ReadPersondata}} template to make it simpler to include Persondata into an article: in the article itself (or on its talk page) one can use

{{ReadPersondata|key=birthdate}}

instead of the above. Geometry guy 16:05, 15 June 2007 (UTC)[reply]

It looks good. Effectively, you are implementing the "new tab" thing discussed above, but putting it on a talkpage subpage instead. One thing though - on the subpage, the metadata is not visible. Is there a way to make it visible so people don't have to click "edit" to see what is there? Also, please see Wikipedia:Bots/Requests for approval/Polbot 3 and User:Polbot/ideas/defaultsort for the rapidly advancing ideas of using a bot to standardise the existing data. That will still encounter the problem of location, as people will still have to update article metadata and sort keys in different locations, and would need to be re-run at intervals. Your proposal would solve this. The problem is which to do first. I'd say do the bot run first (which will also show the scale of the problem), and while that is happening, get this idea of your advertised more widely. Who knows, if the right developers hear about it, they might implement a metadata tab so we don't have to use subpages of talk pages! Carcharoth 16:55, 15 June 2007 (UTC)[reply]

There should also be a way to add references to confirm that the metadata (such as birth date and place of birth) is correct. How to do this? Carcharoth 16:56, 15 June 2007 (UTC)[reply]
Actually, this is one reason why I think the data should be in the article. People need to be able to edit things directly. If they press edit and instead of "15 April 1955", they see {{ReadPersondata|key=bithdate)), then that will be very offputting. It is offputting enough for infoboxes at the moment. More templatization of articles would be bad. I can see why consolidating the sortkeys would be a good idea, but I think that it should all centre on DEFAULTSORT. Not sure quite where the solution lies. Carcharoth 17:14, 15 June 2007 (UTC)[reply]
I don't see any easy way to extract the sortable name from DEFAULTSORT: I view this as an application of the sortable name, rather than its source. For example, sortable names can also be used in tables, not just categories. Geometry guy 18:05, 15 June 2007 (UTC)[reply]
Yes, you are right, forget that quibble of mine. Carcharoth 23:16, 15 June 2007 (UTC)[reply]

It is no problem to make the metadata visible on the subpage. (Actually it is already visible to those who've customised their CSS to view person data.) References could also easily be added on the subpage. However, in my view, any information in infoboxes requiring verification should also be in the body of the text. It should be possible to make the infobox not at all offputting: it might just be {{Infobox_President}} for example, with all the data automatically transcluded from the subpage. There could be an edit tab on the infobox which links to edit the subpage.

Concerning what to do: the first thing is not to rush to conclusions, but to think through the various ideas before doing anything; it is probably also a good idea to separate (at least mentally) information gathering from manipulating data. I noticed the rapidly advancing plans you mention already; they look very interesting and I was intending to comment further there soon. Some bot work will surely be needed both for information gathering and data migration, but there are nearly 400000 articles to play with here, and when planning a journey of such a scale, it is invaluable to have a clear idea of the destination. Geometry guy 18:05, 15 June 2007 (UTC)[reply]

I've now made the subpage visible. For this I needed to make the use of the subpage for generating the (invisible) Persondata table explicit. Together with the above discussion, this suggests to me that the subpage could be used to store arbitrary metadata, and the key parameter can be used to extract the data which is needed for a particular purpose (such as the Persondata table). Geometry guy 20:12, 15 June 2007 (UTC)[reply]

Is there any reason that the sub-page (via the template) couldn't also include an hCard microformat? I'd be happy to supply the necessary mark-up. Andy Mabbett 20:34, 15 June 2007 (UTC)[reply]
What would you think about having it at Persondata:Alexander Grothendieck instead? – Quadell (talk) (random) 22:42, 15 June 2007 (UTC)[reply]
These are not valid namespaces at the moment, and so I expect they are viewed as articles (and so artificially inflate the number of WP articles). Geometry guy 23:00, 15 June 2007 (UTC)[reply]
That's fine. It will certainly be made into a valid namespace if this proposal is widely followed. And it will only be widely followed if it's intuitive and easy to use. Metadata:Alexander Grothendieck is a lot more intuitive than Talk:Alexander Grothendieck/Persondata. (Besides, wouldn't this artificially inflate the talkpage count?) – Quadell (talk) (random) 23:08, 15 June 2007 (UTC)[reply]
Sorry if my comment gave the wrong impression, Quadell, as I'm definitely with you in spirit. For instance, I think that disabling subpages in the mainspace is the wrong way to enforce the (sensible) policy of non-hierarchical article format. (Talk page subpages are not disabled, and so they don't inflate the talkpage count.) I agree entirely that this is about metadata in general, not just Persondata: the latter is just one application: I hope you notice this in my more recent comments and edits.
The ugliness of Talk:Alexander Grothendieck/Persondata was the main reason I introduce {{ReadPersondata}}! ({{/MetaData}} would work for me as an article subpage if these were acceptable.) But I am a pragmatist, and we have to build our ideas within the current framework. As you say, if they are successful and intuitive, they may attract a wider attention and a cleaner formulation. Geometry guy 23:38, 15 June 2007 (UTC)[reply]

My answer to Andy would be that hcard format could be included in the processing of the data, but not on the subpage itself, since this data needs to be updatable by any editor. It would be easy, however, to build another subpage which transcluded primitive data into the hcard format. Geometry guy 23:00, 15 June 2007 (UTC)[reply]

I'm not sure why you think that using a microformat would affect an editor; all a microformat is is HTML classes in the rendered out put, they do not appear on the page when editing. (I also like the idea of the page being called Metadata:Alexander Grothendieck. Andy Mabbett 08:22, 16 June 2007 (UTC)[reply]
  • I think it's important to state some things about the philosophy of metadata out front. Like:
    • Visible information in (non-template) article-space should never come from metadata. The text "Born {{Getmetadata|birthdate}}" should never appear, for instance. It is only used for categories, or in templates and such.
    • The Wikipedia article is the source of the metadata, by definition. There should never be metadata information which is not mentioned (and, ideally, sourced) in the article itself. That way we don't have to worry about cites in metadata -- the source for the info is the Wikipedia page. Conversely, if there is a discrepancy (not caused by vandalism), it's safe to assume that the article is right.
    • Templates such as infoboxes could be simply transcluded into articles without parameters. These templates would use magic words to point to the metadata. The [edit] link on infoboxes should go to the article's metadata, not the template (since these templates are, let's be honest, to complex for most users to edit anyway.)
    • It should be as simple as possible for users to find and edit metadata.

I'd actually suggest moving this discussion to someplace more centralized. Maybe Wikipedia:Separate metadata? Perhaps categorized under Category:Wikipedia proposals, with links from Wikipedia:Requests for comment/Style issues and Wikipedia:Centralized discussion? I don't want to get the conversation bogged down with too many opinions and ideas, but on the other hand, this would effect a huge portion of Wikipedia if implemented. – Quadell (talk) (random) 23:08, 15 June 2007 (UTC)[reply]

I agree entirely with your bullet points about metadata. The metadata should be taken from the article and stored in one place (so that if the article is wrong and needs to be updated, it is easy to fix the data). Then it should be used to generate persondata, infoboxes, etc. And of course edit links should point to an easy-to-comprehend metadata page, not a complicated template! Geometry guy 00:01, 16 June 2007 (UTC)[reply]
  • Responding to Quadell about namespace: do you mean a new namespace with its own talk page, or do you mean a new tab associated with articles (like the current talk pages?). If a new namespace, you have the problem of what happens when the article is moved to a new name. If a tab, then that would move with the page, and issues would be discussed on the talk page as normal. If you are going to have a new tab or new namespace, why not just go the whole hog and make it available for all metadata (including the hCard format Andy Mabbett mentioned above)? BTW, is that really a new namespace, or have you just created an article page with the title "Persondata:Alexander Grothendieck"? :-) A think a new tab is the most viable method, but unfortunately that would also be most developer resource-intensive. Carcharoth 23:16, 15 June 2007 (UTC)[reply]
  • Responding to Geometry guy's comments, thanks for making the subpage visible. One query though - why isn't the sortkey parameter visible? Is that because the original persondata template doesn't include that parameter yet? About the references, you are quite right, for this sort of data that should (in fact must) be in the main text of the article as well as an infobox, the references stay with the article. I was thinking more of the kind of data included in infoboxes for things like articles on chemical elements or planets. See hydrogen and Earth for examples, though they have their own ways of dealing with their data. We should probably stick to biographical data for now! The idea of an edit tab linking to the subpage is a great idea. You've actually answered all my worries so far, and I agree entirely about taking it slowly and getting an idea of what is needed first. So, what next? Carcharoth 23:16, 15 June 2007 (UTC)[reply]
The sortkey parameter isn't visible because I was lazy and just copied the format from the persondata table. However, now that the persondata behaviour is separated from the subpage data, anything is possible. The template can display the data on the subpage however you want it to, and still provide a query mechanism to access the individual fields, and also constructions built from these fields, such as the persondata.
Regarding infoboxes, I think we have to rely on the specific infobox to decide how to handle the data. They are all very diverse, but they might all benefit from transcluding subpage data. The wikilinking and formatting of this raw data, should, however, be left to the infobox template.
The subpage idea is still rather dominated by the initial motivation from the Persondata template. It is becoming more flexible now, and I will continue to work in that direction. Geometry guy 23:53, 15 June 2007 (UTC)[reply]
  • Responding to Quadell again, after edit conflict, I agree a separate page to discuss the wider issues of metadata is needed, but surely this has been discussed elsewhere before? This may even be a perennial proposal, though maybe no-one's ever taken it this far. Carcharoth 23:16, 15 June 2007 (UTC)[reply]

Tidying up a few pages

Should the pages at Template talk:Persondata/Removing data have persondata or not? If not, please help tidy them up. Thanks. Carcharoth 01:03, 16 June 2007 (UTC)[reply]

Just remove them. Persondata belong only to biographys (only one instance per page) or on redirects if there's only one article covering several people (f.e. twins, who're only notable as twins but not as single persons). --32X 01:30, 16 June 2007 (UTC)[reply]

Persondata tagging script

In case anyone missed it, see Wikipedia talk:Persondata#Half-automatic tagging with persondata-tool for details of a script to help extract and add persondata. Carcharoth 01:10, 16 June 2007 (UTC)[reply]

Metadatain biogrpahy infoboxes

What's to be done about projects like WikiProject Composers and WikiProject Opera, where a cabal are insisting, against the evidence, to have a consensus for the removal of biographical infoboxes from all of "their" articles? Andy Mabbett