Wikipedia talk:Metadata standardization

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Hey guys, we have a problem. Because we still have not added any type of metadata support to MediaWiki, we have competing (and redundant) solutions rushing in to fill the void (without any kind of coordination or attempts to harmonize solutions). For locations, for example, we have dozens of different templates to enter geographical coordinates into, and for people we have dozens of different templates to enter birth and death dates into. This metadata redundancy is getting out of hand. We don't need {{Persondata}} AND {{birth date}}. We don't need {{Geolinks-US-cityscale}} AND {{Infobox city}}. Ideally, we should have a method to add metadata to articles that is standardized and built into MediaWiki. Since that's probably not going to happen any time soon, let's at least have some discussion on how we can clean up the existing mess. We should never have to include information in a Wikipedia article more than twice at the most (once in the article body and once as metadata). Kaldari 23:19, 27 August 2007 (UTC)

Twice? Once. The metadata needs to be useful for the body as well and that's what's wrong with {{Persondata}} and why I'm going to continue to use {{Birth_date}} - which in turn really should generate "YEAR births" categories. Michael Bednarek 00:19, 28 August 2007 (UTC)
Good point. Is there a way to standardize the use of {{birth date}} so that the birthday doesn't need to be entered for the article body, the infobox, and the birth category? Can we dictate one as mandatory and have the other sources pull from that? Is there a way to do this for other data as well? Kaldari 01:47, 28 August 2007 (UTC)
{{birth date}} just returns the date and current age into the article, while {{persondata}} collects a small set of useful data for different kinds of use (even) outside the article. Prefering the first over the second is like prefering carrots over cars – on a first look both start with the letters car, on a second look they are completely different. (Automatic categorization has been already discussed in the German Wikipedia, but it turned out there are way too many special cases. A solution would to turn the simple structure of the persondata into a complex structure – pretty much a no-go.) --32X 19:25, 29 August 2007 (UTC)
I would happily use {{Persondata}} if I could use its elements in the body of the article, IOW if I didn't have to enter them again. I'm not going to use it to provide data -which is invisible to almost all users- for some obscure extraction and/or searching tools. 32X said: "...(even) outside the article" - what about inside the article?
And how many special cases can there be to prevent {{birth_date}} to be used for Category:Births by year? Michael Bednarek 03:07, 30 August 2007 (UTC)
Let's see. How many celebrities have attempted to change their age to make themselves seem younger? Everyone of these has therefore got two (or more!) birth dates which need some descriptive text to resolve.Filceolaire 12:53, 30 August 2007 (UTC)
Have a look at Category:12th century births (and the ones before and after) and you'll find enough special cases. There might be a lot more articles without even having birth/death categories. --32X 14:50, 11 September 2007 (UTC)

Thoughts on standardization[edit]

It seems to me that each issue of redundant metadata solutions has to be looked at individually. Often, the solution is as simple as adding some parameters to one of the templates, and then deprecating the other. For example, I recently noticed a message on Template talk:Reqphotoin proposing deprecation for that template, because geographical information can now be added to Template:Reqphoto. Ideally, metadata templates should have as many parameters as possible, lessening the need for alternates. Conflicts will still arise over certain points of style that different groups of editors might prefer, but those can only be solved through discussion, or incorporation of both styles into the template in some way.--Danaman5 05:53, 28 August 2007 (UTC)

What's wrong with Semantic Mediawiki[edit]

I've followed the development of Semantic Mediawiki (See the proto-type at This is an extention to mediawiki for describing the properties of a wiki page in a machine readable form using a system for classifying wikilinks by giving them a type ( e.g. Berlin "is capital of::Germany" - creating a Relationship) Other data about the subject of a page can also be classified (e.g. Berlin has "population:=3,391,407|3.4 million" creating an attribute).

This is all very well for simple relationships but in the real world facts are rarely that simple. Berlin has only been the capital of Germany since the 1990's. Before then it was Bonn except that East Germany had East Berlin as capital.

Similarily the population given for Berlin is only true for one particular time.

Even for something like the area of France wikipedia has 4 different values and that's not considering how the area has changed in the past.

There is (so they tell me at ontoworld) no practical way to add attributes to relationships or to other attributes so we can't put a start and end year to "is capital of" or "population". Given this limitation I don't believe there is much room in wikipedia itself for this level of structure.

There is however, I believe, scope for a wiki to collect together these types of raw factoids. I think these should be collected in a new commons wiki and then imported from there to populate templates in multiple languages. We can dip our toe in the water by using wikispecies data to populate the species data templates in wikipedia species pages. If that works then we can try something bigger.

Of course the level of structure needed for that sort of application may mean that the 'Wikidata' extensions are more appropriate than the freeform Semantic wiki structures. See for an example of the wiikidata software being used to create an everylanguage dictionary.

So my proposal:

  • Create a mechanism to import data from Wikispecies to use in animal and plant wikipages.

If that works then:

  • Create a new Wiki for basic facts. Call it Wikistats or something.
  • Every fact in wikistats will be widely reused so every fact needs a reference, no exceptions.
  • The creative input in wikistats is mostly in the data structures rather than the individual facts so let's dual licence the wikistats data (GFDL and CC-BY-SA) to make it more widely reusable.
  • Replace data templates with new templates referring to wikistats data
  • anything else? —Preceding unsigned comment added by (talk) 22:36, August 29, 2007 (UTC) err that was me Filceolaire 22:41, 29 August 2007 (UTC)
I like the idea, although the specific suggestion of using Wikispecies may be I'll-advised as the quality of the data there is very poor, IMO. (Lots of inconsistencies, outdated information, missing information, and just plain wrong information.) The taxonomic information in the English Wikipedia is actually more thourough and of higher quality generally, probably due to attracting far more eyeballs. The reverse might actually be a good idea though. Kaldari 03:46, 30 August 2007 (UTC)