User talk:Jimbo Wales: Difference between revisions

Page contents not supported in other languages.
Page semi-protected
From Wikipedia, the free encyclopedia
Content deleted Content added
removing my post
Line 98: Line 98:


:I feel like the whole [[WP:NOT]] is a wastebasket (or "coatrack") policy that encourages unhelpful actions. The unspoken rule is that one uses one of the dozens of all capital letters shortcuts and ''never'' reads the policy to which it is attached. This means at once that people use it to get rid of good content that they have some kind of prejudice (or is it vested interest?) against, even as the policy itself as written is ignored. So for example "NOTNEWS" is taken to mean that Wikipedia can't be up to date, but apart from such deletions it isn't particularly reliable at preventing special language used for recent developments in a topic. And "NOTMEMORIAL" is used by almost half the RFC voters to mean that if an article is about a recent set of murders, it shouldn't say who was murdered if there were more than a dozen or two. (Just try finding ''that'' in the policy) But it never applies to the killer - he gets to have his own article and thorough detailed coverage of everything he thought about, because he is important and cool and everybody wants to know what he thinks, whereas including a well-sourced sentence or two about what the victims did would be pure schmaltzy sentimentalism. Well, "NOTHOWTO" is one of those - it isn't actually written to say our articles are supposed to be uninformative, only that they're supposed to be be encyclopedia articles rather than ''"1) prepare your work area, 2) read through this guide carefully before beginning..."'' The same with dictionary terms (which doesn't mean we can't have an article about a word) etc. And the reason for all this is that we have a "core policy" which is written like a grab bag of stuff not to include when what was really intended or wanted when people thought about any of those items is that we had a sensible style guideline on the topic. We ought to split the whole thing up, farm it off to various guidelines, and reconsider whether many of the items in it really should be excluded at all. [[User:Wnt|Wnt]] ([[User talk:Wnt|talk]]) 19:17, 11 August 2016 (UTC)
:I feel like the whole [[WP:NOT]] is a wastebasket (or "coatrack") policy that encourages unhelpful actions. The unspoken rule is that one uses one of the dozens of all capital letters shortcuts and ''never'' reads the policy to which it is attached. This means at once that people use it to get rid of good content that they have some kind of prejudice (or is it vested interest?) against, even as the policy itself as written is ignored. So for example "NOTNEWS" is taken to mean that Wikipedia can't be up to date, but apart from such deletions it isn't particularly reliable at preventing special language used for recent developments in a topic. And "NOTMEMORIAL" is used by almost half the RFC voters to mean that if an article is about a recent set of murders, it shouldn't say who was murdered if there were more than a dozen or two. (Just try finding ''that'' in the policy) But it never applies to the killer - he gets to have his own article and thorough detailed coverage of everything he thought about, because he is important and cool and everybody wants to know what he thinks, whereas including a well-sourced sentence or two about what the victims did would be pure schmaltzy sentimentalism. Well, "NOTHOWTO" is one of those - it isn't actually written to say our articles are supposed to be uninformative, only that they're supposed to be be encyclopedia articles rather than ''"1) prepare your work area, 2) read through this guide carefully before beginning..."'' The same with dictionary terms (which doesn't mean we can't have an article about a word) etc. And the reason for all this is that we have a "core policy" which is written like a grab bag of stuff not to include when what was really intended or wanted when people thought about any of those items is that we had a sensible style guideline on the topic. We ought to split the whole thing up, farm it off to various guidelines, and reconsider whether many of the items in it really should be excluded at all. [[User:Wnt|Wnt]] ([[User talk:Wnt|talk]]) 19:17, 11 August 2016 (UTC)

== Deleted articles by category ==

Which categories have had the largest number(s) of deleted articles?{{nowrap|—[[User:Wavelength|Wavelength]] ([[User talk:Wavelength|talk]]) 21:14, 11 August 2016 (UTC)}}

Revision as of 21:45, 11 August 2016

    Strongest/weakest subjects

    I know this page is watched by quite a few statistics oriented people. I was wondering if it would be possible to somehow do some sort of study into what our strongest and weakest subjects are on here by subject, as well as distribution globally. An idea of article count by subject and quality and then an overall analysis of what needs the most work on here. If we officially know what fields need the most work something could be planned to do something about it.♦ Dr. Blofeld 16:11, 9 August 2016 (UTC)[reply]

    I think this is a good idea but very difficult in practice. As a first step we might want to think about various possible definitions of "strong" and "weak". For example, do we weight by popularity with readers or not? I can see good reasons for both, but also complications with both. For example, popularity statistics may not help us understand our weaknesses in those areas where we don't get much traffic because we are weak. At the same time any serious systematic approach to identifying areas where we should plan to do something about it surely must include an analysis of reader desires.--Jimbo Wales (talk) 16:24, 9 August 2016 (UTC)[reply]
    My categorization of 1000 randomly selected Wikipedia articles
    I know Dr Blofeld knows about my fairly informal stats work (see e.g. User:Smallbones/1000 random results) and the graph shown here, but I have to say that there are some very big obstacles in the way of doing this right. I'd love to see what he proposes done well, but all I can really add is a few related items that might be done and describing what might be needed to do a good analysis that we don't have in place.
    The first problem that I have is that we really don't have a good categorization of articles. Sure we have multiple categories at the bottom of almost every article, but these are not mutually exclusive (I'd really like to know whether an article is about history or geography before I say whether history or geography is our stronger subject, but we just don't have that). Perhaps we could attempt to get some sort of mutually exclusive categories put into our cats. Maybe a keywords system would help. But my feeling is that anything like that would be viewed as the end of the world by the folks who spend so much time on cats, so a second (outside) system is needed. Probably this could be done using some artificial intelligence combined with a dozen folks doing the "training". But 1st a general division should be set up, say 20 or 30 mutually exclusive categories, probably organized into a hierarchy, where each of the categories could be expected to have at least 1% of all articles, and no more than 10%. That way everybody will have a good idea what our "subjects" are.
    I'd think Halfak (WMF) might be in a position to help out here. more later Smallbones(smalltalk) 17:00, 9 August 2016 (UTC)[reply]
    "Strongest and weakest subjects" needs some definition - is it quantity or quality we're looking for. Ultimately stats only deals with quantities, but there are some quantitative measures of quality that we could use. The ORES measure of article quality is one of these and I'd think it would be much better than "good enough" to take a first looks at this type of study. But if we just look at the number of articles in a certain area as a sign of strength, we can get some idea but not as good as Dr B would like. e.g. our weakest subject of the ones in the graph above is biographies of deceased sportswomen. Not 1 such article showed up in the sample of 1000 articles! One of our strongest subjects would be bios of living sportsmen. Also popular cultures since 1991 (aka Culture & Art 1991+) would have to be one of our strongest areas.
    If you get into "traditional" subjects for encyclopedia articles, anything in the sciences, or anything like philosophy is very weak just in terms of number of articles.
    So we need categories, and internal measures of quality (like ORES). Another way of assessing quality would be to have outsiders do it, e.g. readers as suggested by Jimmy, or outside experts, e.g. academics or journalists. Most outside experts aren't going to want to assess across categories however, e.g. scientists aren't going to want to assess geography articles. I'm sorry but this is such a big topic, I sould get better organized before continuing. Smallbones(smalltalk) 17:19, 9 August 2016 (UTC)[reply]

    Library based classification system

    (this is very, very important. I've taken the liberty of making it a sub-section)) Smallbones(smalltalk) 16:02, 10 August 2016 (UTC)[reply]

    To respond to "I'd really like to know whether an article is about history or geography before I say whether history or geography is our stronger subject, but we just don't have that" I'd say call on ye Librarians! (There's plenty of us around). Whether Library of Congress or Dewey classification system all items end up with a call number. So geography is split from history, European history from English, English from 'Home Counties' or 'Midlands', 'Midlands' from 'Norfolk' and 'Lincolnshire', 'Lincolnshire' into regions/towns/villages depending on the degree of focus of said item. If Wikipedia wants that level of classification it can be done. The questions are 'do they?', 'which system?' (or a new one) and how to implement. AnonNep (talk) 18:36, 9 August 2016 (UTC)[reply]
    AnonNep, Thanks for this - it's good to know that people have thought through these problems before and that there is a resource out there. My first question is "what do people want to use the categorization system for?" I just need broad categories of subjects that have between 1%-10% of Wikipedia articles in them (so I can compare coverage and quality), but I'm not against breaking it down some more in a hierarchical system. BTW, what do people use the current categorization system for? I've been on Wikipedia for over 10 years and really can't remember more than a few times I've used it to find anything. The search box at the top serves most of my searching needs, and Google seems almost as good for finding WP articles.
    What I'd really like to see is a broad classification system that can be applied purely mechanically. Just have a program read the text and spit out a category. I'm sure an artificial intelligence program could do something like that. Would it work for the LOC or Dewey Decimal systems? Smallbones(smalltalk) 19:34, 9 August 2016 (UTC)[reply]
    This is where there would need to be WMF involvement. As to "what do people want to use the categorization system for?" I'd suggest that a Dewey system (arguably the most used) would allow linked searches ie. look up something on the Library catalogue & you can drill down on that topic hierarchy to a Wikipedia article even if your library doesn't have a book. A recognised classification scheme integrated with the present system could also allow for much more transparent statistics on usage. Given that this hasn't been done - despite it fitting so well with those variously rumoured (and denied) secret search project plans I'm equally curious why its never been put forward publicly. I'm sure WMF must have considered it but found problems. Once again, if so, transparency would be useful. (NB. I have little use categories as they stand or remember to add to them. I usually only notice when they're obvious & deleted.) AnonNep (talk) 21:16, 9 August 2016 (UTC)[reply]
    @AnonNep: This is a great idea, but there is an unjust obstacle: apparently the Dewey Decimal Classification is proprietary, and hence very much not an option for us, both in terms of legal restrictions and overall attitude. I looked up about the Universal Decimal Classification and it's nonprofit but a similar story. [1] I think we could use Library of Congress Classification but it is not very systematic and the national tie might be viewed unfavorably. But I'm not a real librarian; maybe someone can suggest something. I found a site here with a notion for trying to do to Dewey what Wikipedia did to Britannica - if WMF took them on board and gave them legal cover while inviting them to put numbers on all the articles, no doubt they would prosper, whereas otherwise, I don't know what to expect. But I don't know they're the most worthy effort that's been made so far! In any case, that is the Wikipedia spirit. Wnt (talk) 14:41, 10 August 2016 (UTC)[reply]
    Agreed. Library of Congress/Dewey are generally the 'big two' but, as I added, "'which system?' (or a new one)". I'd also be for something sympatico with Wikipedia's spirit. Not-for-profit and non-proprietary would be the way to go, if possible. Interested to hear from the tech-side boffins in regards to any ideas on how any Library style cataloguing system might add functionality/user benefits. And any stumbling blocks it might face. AnonNep (talk) 22:32, 10 August 2016 (UTC)[reply]

    @AnonNep, JohnMarkOckerbloom, and Mary Mark Ockerbloom: and calling all librarians and OCLC folks I'll jump in here to say that AnonNep has hit on something that could be incredibly important to Wikipedia (and beyond). I've pinged John Mark Ockerbloom because I know he is an editor here and he is the guy who did [this. OCLC is hugely important in all of this, among other things they own the remaining copyrights on the Dewey Decimal System and seem to administer the Library of Congress Classification (LCC). They have their own template on Wikipedia "OCLC|xxxxx". And have or have had Wikipedians-in-Residence @Ocaasi, Merrilee, and Maximilianklein: OCLC is clearly not the enemy here.

    Just to briefly state why I've pinged so many folks: AnonNep has suggested that using a Library categoriztion system Like Dewey or LCC would be a good idea for Wikipedia. There are lots of potential uses for it. I'm very much over my head here. Can somebody outline the basic issues? Smallbones(smalltalk) 16:02, 10 August 2016 (UTC)[reply]

    I'll outline what I see as the possible uses and issues.
    • In classifying Wikipedia articles according to a hierarchal system so that each category will be mutually exclusive and analysis of content quality, quantity of articles, page views, etc. can easily be done with meaningful categories.
    • Readers and editors would be able to drill down and find actual books in actual libraries (and even online) to cover any topic in much greater detail (i.e. book vs. article) than we can do
      • Something like the OCLC's World Cat is needed here, to search books in librarys, which World Cat does very well.
    • Using DDS or LLC to classify every article would take some time, but likely could be done. Smallbones(smalltalk) 18:09, 10 August 2016 (UTC)[reply]
    copied from User talk:Jimbo Wales/Unprotected Smallbones(smalltalk) 20:10, 10 August 2016 (UTC)[reply]
    I tried posting the following but am having local firewall problems. Can someone add it to the discussion please.
    Smallbones pinged me about this discussion. I suggested something related a while back. My thinking was that it would be useful to direct users to appropriate classification numbers. I mocked up a page here (with the actual template here). I haven't done anything about it for a while though. I'm currently on holiday and have limited access, so will review this topic and add more at the weekend. Martin of Sheffield (talk) 19:38, 10 August 2016 (UTC)[reply]

    Literature review

    See "The sum of all human knowledge": A systematic review of scholarly research on the content of Wikipedia (December 2, 2014)—Journal of the Association for Information Science and TechnologyWiley Online Library.
    Wavelength (talk) 19:45, 9 August 2016 (UTC) and 23:24, 9 August 2016 (UTC)[reply]
    See Wikipedia:WikiProject Missing encyclopedic articles.Wavelength (talk) 20:40, 9 August 2016 (UTC)[reply]
    See Wikipedia:Short popular vital articles.Wavelength (talk) 20:45, 9 August 2016 (UTC)[reply]
    See User:Emijrp/All human knowledge.Wavelength (talk) 21:30, 9 August 2016 (UTC) and 23:24, 9 August 2016 (UTC)[reply]
    See Wikipedia:Wikipedia Signpost/2009-04-20/Wikipedia by numbers.Wavelength (talk) 00:09, 10 August 2016 (UTC)[reply]
    WP:NOTHOWTO. EllenCT (talk) 15:46, 10 August 2016 (UTC)[reply]
    How is your post of 15:46, 10 August 2016 (UTC) related to my post of 00:09, 10 August 2016 (UTC)?Wavelength (talk) 15:56, 10 August 2016 (UTC)[reply]

    See User talk:ExpertIdeasBot and [2] for one onwiki approach to this problem. Smallbones(smalltalk) 15:07, 11 August 2016 (UTC)[reply]

    One more on the fallback problem

    Hi. Please see what Nemo_bis has answered to me: mw:User_talk:Nemo_bis#Russian_fallback_language It feels like they're playing God. This is not normal, this is completely wrong. Can this be considered a status abuse? And what can be done if so?--Piramidion 21:12, 9 August 2016 (UTC)[reply]

    Piramidion, please see this latest update on the ticket. TL;DR -- WMF Language Engineering's Runa Bhatacharjee has confirmed it's going to be removed per community wishes; just a little more patience, as they figure out the (not frequently attempted, I understand) technical process. Asaf (WMF) (talk) 17:49, 10 August 2016 (UTC)[reply]
    Yes, seen this, thanks! Hope it won't take too long. But in the meanwhile I can finally calm down and keep translating stuff. It's such a relief to know the process has already started --Piramidion 18:45, 10 August 2016 (UTC)[reply]
    Thanks, Asaf, for the update!--Jimbo Wales (talk) 19:23, 10 August 2016 (UTC)[reply]

    Offline Content Generator

    Following archiving of the recent Book creator discussion here, I have created a stub page for the Wikipedia:Offline Content Generator and copied the discussion to its talk page. — Cheers, Steelpillow (Talk) 10:41, 10 August 2016 (UTC)[reply]

    Need more practical, general knowledge topics

    Tangent from: "#Strongest/weakest subjects".

    Wikipedia needs more coverage of practical topics, or general knowledge, which has been limited by the avoidance of wp:HOW-TO text. For example, many air-conditioning (A/C) units depend on so-called "motor capacitors" to start or run the heavy-duty motors of the fan unit or compressor motor inside an A/C unit. However, when I created the page "Motor capacitor" on 24 September 2008‎, then within hours it was soon met a wp:PROD speedy followed by wp:AfD deletion debate, which I interpreted as the frustrating, uphill struggle to create pages about practical topics in engineering or home appliances.

    Meanwhile, Google needs WP to better explain such practical, general knowledge. Recently, Google has been drowning in information overload from zillions of adverts or gossip hunches. In a reader-focused subject such as "home appliances", the vast ocean of website pages with adverts or hunches about the topic has tended to overwhelm Google searches with too many rambling pages of partial, limited information. Although there are millions of related subtopics, perhaps 5,000 pages about home applicances could cover the basic technology and operation of recent devices. Similarly, in the field of mechanical construction, the article "set screw" needs revision to better explain (to clarify) using an adjustable screw to hold a handle or rail in position on mechanical devices, such as water faucets/spigots or rake handles. -Wikid77 (talk) 16:07/16:13, 11 August 2016 (UTC)[reply]

    See 10 Skills You Need to Succeed at Almost Anything - Stepcase Lifehack (archived): public speaking, writing, self-management, networking, critical thinking, decision-making, mathematics, research, relaxation, basic accounting.
    Wavelength (talk) 16:12, 11 August 2016 (UTC)[reply]
    Wikibooks has b:Subject:Books by subject, and Wikiversity has v:Wikiversity:Browse.
    Wavelength (talk) 16:44, 11 August 2016 (UTC)[reply]
    • Similar 10 basic concepts of a topic: In line with a view such as "10 Skills You Need to Succeed" perhaps WP could have similar long-term lists as recommended by reliable sources. For example, the "12-step program" covers a well-known process for addiction recovery. In criminal law, the 3-aspect rule "Means, motive, and opportunity" helps to determine guilt or innocence of the accused, and could be used to help editors write about crimes in a focused, concise manner, rather than dwell on rambling opinions (or gossip) about an alleged crime. WP has a good start on coverage of general knowledge, but more is needed (and note how a general topic, now, might have only 9 footnote sources, while a footballer or battle in 1917 might list 49 or 250 or 560 sources, as broader coverage). -Wikid77 (talk) 16:53, 11 August 2016 (UTC)[reply]
    I feel like the whole WP:NOT is a wastebasket (or "coatrack") policy that encourages unhelpful actions. The unspoken rule is that one uses one of the dozens of all capital letters shortcuts and never reads the policy to which it is attached. This means at once that people use it to get rid of good content that they have some kind of prejudice (or is it vested interest?) against, even as the policy itself as written is ignored. So for example "NOTNEWS" is taken to mean that Wikipedia can't be up to date, but apart from such deletions it isn't particularly reliable at preventing special language used for recent developments in a topic. And "NOTMEMORIAL" is used by almost half the RFC voters to mean that if an article is about a recent set of murders, it shouldn't say who was murdered if there were more than a dozen or two. (Just try finding that in the policy) But it never applies to the killer - he gets to have his own article and thorough detailed coverage of everything he thought about, because he is important and cool and everybody wants to know what he thinks, whereas including a well-sourced sentence or two about what the victims did would be pure schmaltzy sentimentalism. Well, "NOTHOWTO" is one of those - it isn't actually written to say our articles are supposed to be uninformative, only that they're supposed to be be encyclopedia articles rather than "1) prepare your work area, 2) read through this guide carefully before beginning..." The same with dictionary terms (which doesn't mean we can't have an article about a word) etc. And the reason for all this is that we have a "core policy" which is written like a grab bag of stuff not to include when what was really intended or wanted when people thought about any of those items is that we had a sensible style guideline on the topic. We ought to split the whole thing up, farm it off to various guidelines, and reconsider whether many of the items in it really should be excluded at all. Wnt (talk) 19:17, 11 August 2016 (UTC)[reply]