User talk:Jimbo Wales

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Centralized discussion
Proposals: policy other Discussions Ideas

For a listing of ongoing discussions, see the dashboard.

External databases[edit]

A rather rambling recent discussion at Wikidata:Wikidata:Project chat#Bot generated data bought up two fairly basic questions about the relationship of Wikidata/Wikipedia with specialized external databases that hold information on galaxies, rainfall, species, and so on.

First, if two highly respectable sources, such as the Smithsonian Institution and the International Commission on Zoological Nomenclature disagree, as they do on the scientific names of domestic animals, should Wikidata select the version the editors prefer, or should it present both views? That is, should Wikidata aim to present only the truth as we see it, or to present an overview of what reputable sources say, including disputed data?

Second, the Swedish lsjbot caused a stir by generating millions of articles on species and locations, with some inaccuracies. The process was one-shot, with no mechanism to update articles to show corrections to the information. But it would be technically feasible, using carefully selected data sources, to periodically refresh Wikidata from reputable external databases, then to use this data to periodically refresh tables or text fragments to be embedded in Wikipedia articles. A Wikipedia article would include a mix of editor-written text and automatically-updated text or tables generated from a recent extract of the external database(s), perhaps inserted by a template like {{Wikidata|lastcensus}}. The benefits for information like municipal census figures or election results seem obvious, but taken to the extreme it would make Wikipedia an aggregation of data from external sources that may or may not be annotated by our editors.

Comments? Aymatth2 (talk) 15:14, 20 October 2016 (UTC)

I think that editorial judgment is possible. So I'm not so sure the choice is as stark as "present only the truth as we see it" versus presenting both equally. There might be good editorial reasons to favor one system over the other, but I can't think of any good reason to outright omit the information from the other. I am not an expert here, I'm just saying that I don't see there is any a priori reason to prejudge the decision that is made.
I do like the idea of automated updates to data. One thing I would love to see in our articles about publicly traded businesses is "live market cap" - updating that by editing wikitext by hand is silly. But technologically it would not be very hard for a template-like syntax to automagically return the live correct number.--Jimbo Wales (talk) 15:37, 20 October 2016 (UTC)
When I am writing an article and find two conflicting views I usually put the most plausible in the text with a footnote giving the alternative view
He was born in 1756.[2][a]
a. Another source says he was born in 1754.[3]
It is harder to do that with Wikidata, which would have two different attributes, like Smithsonian-name and ICZN-name. There may be a way to get the editorial effect though. Automated updates seem controversial. I like the idea, but maybe others watching this page will comment. Aymatth2 (talk) 17:20, 20 October 2016 (UTC)
If Wikipedia had an article "List of major cities by walkability score" (see "Walkability"), and if various sources provided different scores based on different algorithms, then the article could have a sortable wikitable with different columns for different sources and for different years. (I have not yet attempted to produce that article, but other editors are welcome to do so.)
Wavelength (talk) 20:17, 20 October 2016 (UTC)
My search for an article with data from different sources led me to the article "Life-cycle greenhouse-gas emissions of energy sources".
Wavelength (talk) 02:32, 21 October 2016 (UTC)
See also "List of historical sources for pink and blue as gender signifiers".
Wavelength (talk) 02:49, 21 October 2016 (UTC)
I often find sources that disagree. The Smithsonian vs. ICZN disagreement is whether the wild goat is Capra hircus aegagrus, a sub-species of goat (Capra hircus), or whether it should be named like a separate species Capra aegagrus. Bow ties have been pulled off and spectacle frames bent during debates over issues like this. On greenhouse gas emissions, I imagine the academics are fighting over measurements, calculations and interpretations. I am not sure what the copyright status would be on walkability scores, which are a bit more than mere facts, but assuming there is no issue, a sortable table to compare the score would be much more useful than excluding all sources of scores that we dislike. On gender, the color of the first child's clothes often has a dominant effect on the next child's clothes, a point the article misses. Aymatth2 (talk) 11:55, 21 October 2016 (UTC)
This proposal contains some embedded "WP:Recentism" in that it supposes we want to get rid of the 2010 census data the moment 2020 comes in. Now to be sure, recentism is rampant on Wikipedia already and the editors might do that, but it's really a shame to get rid of old data instead of accumulating it. The catch being that if we give figures for the census all the way back to 550 A.D., it might be hard to fit in the lede. Hmmm. Maybe what we really want here is a Golden Deluxe Reference, by which I mean, we have a special page (here or on another project) that not only cites a single source or a list of related sources but also provides an extract from them. However, the automaticness of that extraction is also an issue - with a human, we expect a page history to show who changed what when, and similarly, if the Census Bureau puts their data in a different format we can't accept the extract going blank with no backup file when the machine next churns its wheels. This one is worth thinking about further. Wnt (talk) 11:20, 21 October 2016 (UTC)
"WP:Recentism" is something we can address. For the census, we could simply have a property in "Smallsville" for "last-census-year", and then have a separate entry for "Smallsville 2010 census", which we would retain after creating "Smallsville 2020 census". If the census format changed and we had trouble pulling in the 2020 census data, the 2010 census would keep displaying until we fixed the problem. Then we decide how much historical data to display.
Using a combination of periodically refreshing Wikidata from external sources, then pulling that data into an article display, or pulling in data from the external source at display time, we could aggregate a huge amount of information with no editor involvement. We could generate an article on a municipality, for example, with a full infobox with map to the right, descriptive text saying when it was founded, what the nearest larger towns are, what the terrain and vegetation coverage is like, average elevation, monthly average temperature and rainfall, population as of 2010 by age and sex, income, literacy etc., results of the last election and present elected officials, today's weather etc. Nothing very exciting, no historical incidents or descriptions of scenic beauty, but probably most of what a reader wants to know about the place.
Taken to the extreme, the effect could be that Wikipedia would show 90% or more data derived from other sources, 10% or less data supplied by editors. That seems a major shift. Aymatth2 (talk) 12:03, 21 October 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── We already have templates to auto-insert population figures for smaller towns (as templates copied from German WP, based on similar developed by Dutch Wikipedia perhaps 10 years ago). As I recall, the populations for larger cities were hand-updated with yearly estimates, based on several various sources, rather than auto-inserted from the region lists of all town populations. -Wikid77 (talk) 16:06, 22 October 2016 (UTC)

User:Wnt is absolutely correct that our references can valuably contain an extract of the source. Determining whether a source supports a statement currently take several steps:

  1. find the source
  2. try to ascertain which part is supposed to support the statement
  3. determine if it actually does

Now the first two parts are purely mechanical and a waste of time, by providing a quote the potentially difficult (if you wish, human) part is all that is needed.
This is why it is such a shame to see productive editors like Richard Arthur Norton attacked for short quotes under a mistaken belief that that they constitute copyright violations. All the best: Rich Farmbrough, 20:17, 22 October 2016 (UTC).

  • I think this is straying into a different issue. Short quotes can violate copyright if they contain the essence of a creative work, which would not usually be the case but could be. Data cannot be copyright protected. "Smallsville 2010 census: Male = 52%" cannot be copyright protected. The idea here is to import data into Wikidata, e.g. "Smallsville 2010 census – Male-percent=52. This can then be exported, wrapped in text and displayed in all the language-specific Wikipedias. By using Wikidata as a staging ground, or by pulling data direct from source databases at run time, we can make the same data available to all Wikipedias.
@Wikid77: How do the templates to auto-insert population figures work? I have not come across them, although I think I may have seen something like that for sports league scores. What countries do they cover? Do they stage the data via Wikidata? Is there any sort of bureaucracy to ensure that they are rerun to refresh? That could be one of the main problems: the bot/template developer drifts off and there is nobody to maintain it. Aymatth2 (talk) 00:16, 23 October 2016 (UTC)

Nation population templates[edit]

@Aymatth2: On English WP, there are 47 auto-insert population templates begun in 2008-2011, which are accessed by town-infobox templates, to extract a town's population count from #switch lists of groups of 200 to 600 town codes or names, as stored in each population template. Some of those templates are copycat adaptations from the same templates on German WP (dewiki), so all enwiki needs to do is copy template #switch lists, which typically needs only 2 hours to update each major nation's town populations, as based on the tedious hand-updates of lists done by editors on German WP. Hence, the updates, here, can be perhaps 10,000x times faster than hand-edits (and proofread) of those 30,000(?) related town pages.

The nations on enwiki include: Austria, Belgium, Germany, South Africa, Switzerland, New Zealand, Turkey, and Cape Verde. Most population templates are named with prefix "Template:Metadata_" (see: Special:PrefixIndex/ Template:Metadata). Many use town code-numbers, but New Zealand "Template:NZ population data" uses region or town names (not numbers). The population templates are quite extensive, and so South Africa town pages only use perhaps half of the population-template data listed for the related towns. Beyond counts of inhabitants, some templates also list current town surface areas or other data. So far, enwiki is 10x lagging behind dewiki, which has over 550 population templates (of 900 town area/data templates), compared to only 47 templates on enwiki. I copied the initial Austria population templates from dewiki and integrated the Austrian-towns infobox in Sept 2011. Oddly, Wikidata is not involved with these valuable population counts (copied from nation census sites) and updated each year by various users, so we could transform most of WP perhaps within a month to copy this auto-inserted population data into town pages, for another 50 nations (France, Spain, Brazil, Portugal...), with no delays from Wikidata. We could start consensus discussions, in parallel, while creating the related nation population-templates and test sandboxes by the time consensus was reached on the exact use of the town data. -Wikid77 (talk) 21:33, 23 October 2016 (UTC)

Again meta:2016 Community Wishlist Survey, 7-20 Nov[edit]

With the 2016 survey (meta:2016 Community Wishlist Survey) in mid-November, I had forgotten the contentious 2016 U.S. Presidential election will occur on 8 Nov 2016, during the same time period as the Wishlist Survey. Now we're within 3 weeks from planning Wikipedia features (or major fixes) for 2017. As noted last year, the month of November tends to be very busy, so it would be good to start collecting ideas now, for the 2-week proposal period, November 7-20, 2016. Just a another reminder. -Wikid77 (talk) 16:18, 21 October 2016 (UTC)

For example, some possible feature changes:
  • Accept next-line edit-conflicts, to allow changes to adjacent lines (no longer insist an unchanged line separate 2 edits).
  • Auto-merge some edit-conflicts, such as 2 replies posted after the same line, perhaps in LIFO order, where the last user's comment follows the original-posted comment. Perhaps show top note: "Edit auto-merged after edit-conflict" as a warning.
  • Allow paragraph-edit, as a mode where each paragraph has an "[e]" edit-link, rather than just limit access by section-header titles.
  • Increase the wp:wikitext parser "wp:expansion depth limit" from 40 to 60 or 80 levels.
Those are a few features which would be nice to have. -Wikid77 (talk) 16:06, 22 October 2016 (UTC)
I suggest we ask the Americans to delay that election, to avoid the clash. All the best: Rich Farmbrough, 20:18, 22 October 2016 (UTC).
I guess Wikipedians will just have to focus keenly on the survey, but the U.S. election is predicted to be a clear victory, not the dreadful 2000 Florida recount, where the U.S. Supreme Court (Supreme Crooks) stopped the recount with Bush ahead, while claiming "not enough time" to recount the state of Florida (by well-known procedures) because the Court had already used 5 days[!] to recount their 9 votes, apparently unaware the disputed ballots had been hand-encoded into computer databases to allow statewide Florida recounts within hours. Anyway, the 2-week window (7-20 Nov), to submit Wikipedia wishlist suggestions, will pass quickly unless we start planning ideas soon. -Wikid77 (talk) 22:00, 23 October 2016 (UTC)

Remarkable story[edit]

There's some "news" about Alexandra Land, e.g.

reporting a secret Nazi weather station discovered by the Russians. But then

  • The Inquisitr (No idea if this is an RS) reports that this "news" has been on AL's Wikipedia page for some time.


I'd love to hear the full story.

Smallbones(smalltalk) 20:57, 22 October 2016 (UTC)

This happens more often than you think. For example, editors began adding a series of citations about Planet Nine inducing solar obliquity in mid-July [1] but it is news to the Mail [2] and other outlets this week. In that case the reason is that Wikipedians know who the expert scientists are and read the preprints on Arxiv, while media outlets wait for them to get their paper accepted and presented to the Astronomical Society. There may be some who would put down such informal means of sourcing, but I think it is exciting and productive for Wikipedia to have people tracking the field directly. Wnt (talk) 09:13, 23 October 2016 (UTC)

WP:BLP violations at Damien Walter[edit]

Damien Walter

Wikipedia:Articles for deletion/Damien Walter

User:Neptune's Trident inserted the WP:Category Gay writers it says there, This category lists notable writers who publicly identify, or who have been reliably identified, as homosexual men.) in his article creation without including a WP:RS support and then it was again inserted by User:Dcirovic another experienced editor and again inserted by User:2601:1c0:4401:f360:e136:8f9e:9acf:27bb, twice that editor inserted the disputed content, again here User:2601:1c0:4401:f360:e136:8f9e:9acf:27bb and then yet again inserted another by another experienced editor here User:Amaury. To stop it all the article had to be locked protected with the stated reason of violations of WP:BLP policy and was then quickly nominated for deletion and is looking like snow closing delete. How can it be that the subject was clearly screaming libelous content, begging for deletion and yet these experienced editors all violated a core wikipedia policy repeatedly, and how can this be improved? Could it not be fair to living people to protect all such pages with pending protection Jimmy, especially when clearly very experienced editors here are apparently failing to understand or simply ignoring WP:BLP? Govindaharihari (talk) 15:18, 23 October 2016 (UTC)

Jimbo might be interested in the legal threat report to ANI. What's also not mentioned are the reports of the subject to both AIV and UAA. I've seen this so many times for so long. I've got to say it frustrates the hell out of me. So predictable. Thankfully, most admins when they hear about this kind of thing aren't DOLTs and aren't hasty to block, but I do see too great a reliance on passing it to OTRS or the legal team. Pending protection, by the way, would not solve any problems here. -- zzuuzz (talk) 19:45, 23 October 2016 (UTC)
Appreciate the response user:zzuuzz , although I agree with your frustration, I think It would add an increased level of awareness to users re-adding disputed content in such a situation if pending protection was enabled. My attempts to discuss with the other users named in the report have so far been less than fruitful. Govindaharihari (talk) 19:55, 23 October 2016 (UTC)
I view protection to be completely unhelpful in situations like this, where experienced editors are making the mistakes. On the other hand, IPs blanking pages is very often really helpful. -- zzuuzz (talk) 20:00, 23 October 2016 (UTC)
Well said zzuuzz. Govindaharihari (talk) 20:10, 23 October 2016 (UTC)
The primary violator of wikipedia:policy and guidelines, the creator of the article User:Neptune's Trident when notified of the wp:afd discussion has failed to comment there at all, they have also failed to comment here and simply deleted my notification to him of this discussion. It was my understanding that editors should respond to good faith requests to discuss their contributions Govindaharihari (talk) 20:57, 23 October 2016 (UTC)
Interested parties (Mr Walter in particular) may wish to view Wikipedia:Articles for deletion/J.C. Maçek III (writer). Further, the more curious editors might question why this non-notable critic currently turns up over 200 times in article references. World's Lamest Critic (talk) 22:35, 23 October 2016 (UTC)
It is possible some, even all of these editors acted in good faith. Wikipedia suffers a lot of vandalism where people simply replace whole pages with some nonsense or an ideological statement. When a volunteer editor sees User:DamienWalter show up at the article with the same name and puts up a delete template with the comment "Page created maliciously" ,[3] there are at least three possibilities: he could be the subject, angry at how the story fairly portrays him; he could be the subject, angry at an unfair portrayal; or he could be a vandal, probably someone hostile to Walter, who has snapped up the name on Wikipedia and is using it to delete the article about Walter because he doesn't like him. Obviously it would have been best for editors to check over the article carefully, but if people had to do that every time there is an act of page-blanking vandalism, the vandals would win the war and delete all the pages they want! So sometimes they make snap judgments. And once Walter has been reverted two or three times in the page history, it seems extremely likely to the next editor that he is a vandal, and so he parades in right after the others. It's volunteer work; it's not perfect. It would have been better to make a more specific complaint in the delete box so that people would know what to look for. Even so, of course, Wikipedians owe him an apology for the mistake. Wnt (talk) 23:03, 23 October 2016 (UTC)

Arbitration dissolution and resuming control... for me.[edit]

Jimmy, I'm accused of accusations of my account. Can you please remove it, and also, dissolve the arbitration group?

--Tacoloco1 (talk) 00:59, 24 October 2016 (UTC)