User talk:Jimbo Wales: Difference between revisions
m Signing comment by 176.15.53.98 - "Now you have obligation to give proofs that this topic was created by user Need1521 (you can do it directly here)." |
|||
| Line 98: | Line 98: | ||
Monika <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/92.5.229.225|92.5.229.225]] ([[User talk:92.5.229.225|talk]]) 09:26, 23 March 2014 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot--> |
Monika <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/92.5.229.225|92.5.229.225]] ([[User talk:92.5.229.225|talk]]) 09:26, 23 March 2014 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot--> |
||
* I think Jimbo has been on travel this week, but several of us have noted the increasing complexity of many articles, which add ever-more abstraction of concepts to widen an article for broader coverage of rare cases. Several attempts to simplify wording have been met with hostility over the risk of omitting unusual [[corner case]]s of a subject (in [[n-tuple|''n''-tuple]] space!), or perhaps a limit to [[wp:data hoarding]], and now many pages read as total "geekspeak" overrun with technical jargon. Hence, the page "[[Polygon]]" must mention the word "[[polytope]]" long before "triangle" or "hexagon" or "[[octagon]]". Even many sports articles fail to explain the score-board systems, such as RHE (runs/hits/errors) numbers. I still recommend writing the clarified versions as pages on [[Simple English Wikipedia]], where the word "simple" refers to the vocabulary used and does not limit topics to only simple treatment. We also tried to branch into a "[[Micropaedia]]" of short, explanatory blurbs about major topics, but that idea was met with numerous objections. Perhaps even harder than writing simple explanations of complex topics, it is a struggle to convey to some people why simplicity even matters. The Micropaedia format would have encouraged thousands of editors to write simple summaries about perhaps 300,000 common topics. -[[User:Wikid77|Wikid77]] ([[User talk:Wikid77|talk]]) 13:56, 23 March 2014 (UTC) |
* I think Jimbo has been on travel this week, but several of us have noted the increasing complexity of many articles, which add ever-more abstraction of concepts to widen an article for broader coverage of rare cases. Several attempts to simplify wording have been met with hostility over the risk of omitting unusual [[corner case]]s of a subject (in [[n-tuple|''n''-tuple]] space!), or perhaps a limit to [[wp:data hoarding]], and now many pages read as total "geekspeak" overrun with technical jargon. Hence, the page "[[Polygon]]" must mention the word "[[polytope]]" long before "triangle" or "hexagon" or "[[octagon]]". Even many sports articles fail to explain the score-board systems, such as RHE (runs/hits/errors) numbers. I still recommend writing the clarified versions as pages on [[Simple English Wikipedia]], where the word "simple" refers to the vocabulary used and does not limit topics to only simple treatment. We also tried to branch into a "[[Micropaedia]]" of short, explanatory blurbs about major topics, but that idea was met with numerous objections. Perhaps even harder than writing simple explanations of complex topics, it is a struggle to convey to some people why simplicity even matters. The Micropaedia format would have encouraged thousands of editors to write simple summaries about perhaps 300,000 common topics. -[[User:Wikid77|Wikid77]] ([[User talk:Wikid77|talk]]) 13:56, 23 March 2014 (UTC) |
||
== The bewilderment == |
|||
Hello. Why act vs man, whose great-grandfather was shot during the [[Collectivization in the Soviet Union]], only because he had small number of cows .. (issue: article "D. Medvedev"). Nobody violated the biography of living person, because the only truth and proofs were used (and in last versions was good structure of text). [[Special:Contributions/95.29.154.198|95.29.154.198]] ([[User talk:95.29.154.198|talk]]) 14:02, 23 March 2014 (UTC) <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/95.29.89.186|95.29.89.186]] ([[User talk:95.29.89.186|talk]]) </span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot--> <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/176.15.53.98|176.15.53.98]] ([[User talk:176.15.53.98|talk]]) </span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot--> |
|||
Revision as of 17:11, 23 March 2014
| Welcome to my talk page. Please sign and date your entries by inserting ~~~~ at the end. Start a new talk topic. |
He holds the founder's seat on the Wikimedia Foundation's Board of Trustees. The three trustees elected as community representatives until July 2015 are SJ, Phoebe, and Raystorm. The Wikimedia Foundation Senior Community Advocate is Maggie Dennis. |
| This user talk page might be watched by friendly talk page stalkers, which means that someone other than me might reply to your query. Their input is welcome and their help with messages that I cannot reply to quickly is appreciated. |
| (Manual archive list) |
Autofixing cites
As discussed last month, I am working to "autofix" (or auto-correct) about 10,000 pages for various invalid cite parameters in the wp:CS1 Lua-based cite templates. I have created a working Lua prototype, to begin comparing the results when a citation has been autofixed for simpler display. Compare the sample results:
- {{cite web |title=Test1 |last=Doe |pages=3--4|Guardian|http://z |office=London}}
- autofix: Template:Cite web/auto
Note, in the above autofixed example, the missing "url=" parameter is set with the text "http://z" in the 5th parameter, and linked to title "Test1" while the double-hyphen in pages "3--4" is filtered as a single dash, 3–4. Next, the 'Guardian' is shown, followed by "office: London" as extra text. By comparison, the current cite is awash in a sea of alarming red-error messages which overpower the text but demand attention to the simple details which have been quietly autofixed in the first case. I, personally, have been distracted by so much red-error text, as focused on fixing red-messages while other, more important, errors remain in the nearby text of an article.
Why Bot fixes have not worked: Many people had claimed that Bot-driven updates would correct most invalid cites, but after a whole year, it has not happened. I think a major problem is the risk of mis-judging the invalid cite parameters and having a Bot "permanently" update the page in a manner contrary to the original cite intention. That risk has had a chilling effect, and scares a Bot programmer to not attempt every automated correction. By contrast, an autofixed cite is a relatively temporary change, altering the displayed page but not actually storing the results, and hence, a "bug" in autofixing can be improved by revising the Lua-based templates, to re-display a better autofixed result when a page is viewed later. The risk in autofixing is much lower, because the original invalid cite data is still available to re-autofix and redisplay, unlike the Bot-updated pages which hide the original cite in the prior revision and hinder the ability to re-fix a mistaken automated update. -Wikid77 (talk) 01:17, 9 March 2014 (UTC)
Handles a hundred respell rules: Although the autofixing of invalid cite parameters appears to be a workable solution to auto-correct 10,000 pages for common typos, it seems to require over 100 possible misspellings (or respelled aliases) for Lua to catch the vast majority of problems, such as spelling "url=" as capital "Url=" or people using "web=" to set the URL address link. Many respell keywords can be detected by checking prefix/suffix letters of each parameter name, where "author=" can be detected by checking prefix/suffix combinations of "au__or" or "a__hor" to match invalid names: "autor=" or "arthor=" or "auther=" or "auhtor=" etc. For some parameters, there is a common respelled form, such as "published=" often used for "publisher=" along with rare misspellings like "publlisher=" or "pulbisher" or "pubsher" etc. See in example below:
- Example: {cite web/auto |last=Doe|titolo=Title|Url=//:x|dtae=May 2011|pubsher=BBC |vol=IV|pg=9 |otters=Fred Smith|translator=Mary Dohh |locaiton=London |First=Tom |ediottor1-last=Smith, Dee|BBC News}}
- autofixed: Template:Cite web/auto
- currently: Doe. Translated by Mary Dohh.
{{cite web}}: Missing or empty|title=(help); Missing or empty|url=(help); Text "BBC News" ignored (help); Unknown parameter|First=ignored (|first=suggested) (help); Unknown parameter|Url=ignored (|url=suggested) (help); Unknown parameter|pubsher=ignored (|publisher=suggested) (help)
Although few pages have contained so many invalid parameters (some have), the above example shows many of the common typos, such as "vol=" for "volume=" (and even rare "dtae=" for "date="), which occur in more than 1,000 pages. However, the autofix algorithms will correct hundreds of potential problems in over 10,000 live pages, including hundreds of draft pages in user-space. It detected invalid "pubsher=" as the "publisher=" parameter, while autofixing over 13 red-error messages, to allow live typesetting of the page as if nothing much was a problem for readers. -Wikid77 06:04, 10 March, 15:52, 11 March, 08:30, 12 March 2014 (UTC)
Reasons for autofixing: I guess I forgot to explain why autofixing of cites is preferable to those red-error messages, which do not inspire people to fix cite errors. Well, most users are readers (not writers) of pages, and they are not motivated to edit a page just because a problem is tagged with a red-error message. Also, beyond the backlog of over 10,000 prior pages containing invalid cite parameters from months (or years) ago, we seem to get dozens of more invalid pages every week. Unless autofixing is used to correct the minor typos (and suppress trivial error messages), it would take 3-5 years (or longer) to hand-edit the pages to fix invalid cite parameters and remove red-error messages from pages, including many major articles where cite typos have been recently added. -Wikid77 09:34, 13 March 2014 (UTC)
Why autofixing parameters works so well: There are some key reasons why the template parameters can be auto-corrected so quickly, as shown by the amazing success with autofixing the cite parameters in the wp:CS1 cite templates. The actual results, when autofixing the cite parameters, have shown much higher accuracy than many people had imagined. Some major reasons are:
- parameter names are typically chosen to use whole English words, easy to respell;
- English words have diverse origins which makes them very different in spelling;
- template writers often choose unique parameter names to avoid confusion between parameters;
- many parameter names differ even in the first/last 2 letters;
- the frequency of parameters follows an 80/20 Rule, with some names used often;
- Lua-based templates can rapidly match misspelling patterns by regex matches;
- all those factors lead to parameters which are easy to distinguish, even when greatly misspelled.
By first screening all the correctly spelled parameter names (as with a whitelist), then only the rare misspellings or alias names need to be autofixed. Then by checking in order, first, for the commonly misspelled, frequent-use parameters, the total amount of time to autofix parameters is greatly reduced, as a tiny fraction of the overall processing time needed to handle parameters. Hence, with the CS1 cite templates, the autofixing has been clocked at over 400 auto-corrected cites in 1⁄2 second, or potentially 800 citations autofixed within one extra second of processing time, if a page could hold that many cites. As a spinoff impact, we should consider autofixing many other common templates, for the often misused, or misspelled, parameters. Then almost any half-way recognizable parameters would be accepted for new users, with a minor note to fix the spellings later (or just have a hidden link to an autofix-warning category, since the template parameters would perform when autofixed), but meanwhile, the user will get instant results from close-enough templates on the first try. -Wikid77 19:16, 14 March 2014 (UTC)
Cite messages in major articles: The scarring effects of those red-error messages are so common that even the page "Vladimir Putin" had 2 cite error messages, despite 17,000-50,000 recent pageviews per day, which should have inspired other people to fix the cites, but they did not (I did). See pages in category:
• Category:Pages with citations using unsupported parameters.
There is no reason to imagine that casual readers of pages would even know how to locate the cite templates, to fix the red-error messages, and I suspect many readers would think, "Hey, the Wikipedia people still have not fixed the typos in this Russia page; are they protesting something like the solitary Ukraine athlete at the 2014 Winter Paralympic ceremonies(?), by leaving these errors in the page, as if refusing to update the page any longer". My first impressions of Wikipedia, in mid-2001, led me to imagine the "editorial staff" was backlogged because a misspelled word remained in a page for days (which I checked all week), and it took me a while to IP-edit the page to fix that typo. However, it took me, years later, quite a while to understand how only a relative handful of people are actively correcting grammar and spelling in pages, while others are sidetracked by bickering or logging tons of trivial data into various articles. That is why the recent 9,000 wp:CS1 cite-parameter errors had lasted most of the past year, still not fixed after all these months. Fortunately most pages are rare topics, with only about 90 major pages having those red-error messages, most viewed from 3,000x per day to somewhat below 100x per day, as with "Sydney Biddle Barrows" (Mayflower Madam). -Wikid77 05:45, 16 March, 13:26, 21 March 2014 (UTC)
Recent cite corrections to 3,000 articles: In the past 2 weeks, over 3,000 more pages (of 8,500) have been fixed to remove the red-error messages in cite references, while the debates have continued, about whether cite parameter error messages should be autofixed as minor warnings, including proposed deletion of Template:Cite_web/auto to prevent users from autofixing their mistakes. When checking the red-error messages, to determine which common errors could be autofixed, I have confirmed how numerous pages have contained those error messages for over 3 months, with many spanning back over a whole year. During the past year, there has been a diversion to instead change thousands of wp:CS1 cites to re-specify dates (instead of fixing errors), where parameters "month=May |year=2008" were changed to be "date=May 2008" as a sub-optimization of date formatting. Such side-track activities are also common in software development, where instead of solving major problems or providing new features for users (with "value-added" functionality), a group of developers is quite likely instead to focus on sub-optimization for internal changes to the software, for little glitches which bother them directly (hence the term "navel gazing"). In fact, it seems that any mixed group of people is unlikely to "think big" and instead focus on whatever minor issues catch their attention. The result is a "negative synergy" because while people debate whether big issues should be fixed easily, then more time is burned which could have been focused on handling even bigger issues. A common management solution to such problems is to have competing groups, where a group which dwells on trivial details can be bypassed by a rival group which shifts into larger improvements. The overall concept is to "work smarter rather than harder" and avoid creating busy work which wastes time. Anyway, at this point, many of the glaring cite errors are being removed by various people who use whatever existing tools to try to clear the one-year backlog of red-error messages in cites. -Wikid77 04:26, 18 March 2014 (UTC)
Autofixing URLs which have bar/pipe: An unexpected "spinoff" to handling misspelled parameter names is the instant ability to autofix a URL which has an internal vertical bar/pipe "|" as in Google Translate links containing "&langpair=it|en&u=" for Italian-to-English, where the bar "|en" triggers the parser to see parameter name "en&u=__" as separate text. Even highly experienced users might copy such a whole URL into a template, not remembering to check for internal bar "|" in the URL, but no problem, because the autofixing could detect the split URL (often having an ampersand "&" in the 2nd part), and rejoin the parts as rapidly as autofixing a misspelled "datte=" parameter. More later. Wikid77 16:25, 19 March
Update about reversed URL parts: Several tests have confirmed URL portions could rejoin in scrambled order. The MediaWiki markup parser tends to split a URL containing multiple bars "|" into scrambled parameters, and so the autofix gets the main URL first, but finds the split portions and rejoins them in whatever sequence. Lua gets all numeric parameters in sorted order: {1}, {2}, {3}, {4} (etc.), while Lua currently swaps named parameters in reverse c-b pairs: a, c-b, e-d, g-f, i-h, k-j, (etc.), but there is "method to the Lua madness" and reversed parts could be re-reversed to rejoin URL parts in order, using a tedious algorithm. Fortunately, a common single-split bar "|" always rejoins a URL properly. -Wikid77 21:40, 22 March 2014 (UTC)
If smart templates had memory
- Tangent from: #Autofixing cites
Future FYI. Currently, Wikipedia cannot save data values from templates, but there is an optional MediaWiki extension software package which can save variables between templates when reformatting a page. There might be performance problems, but in theory, if templates could pass data to each other, then we could have the duplicate or similar footnotes auto-trim as "ibid" to omit the repeated titles/publisher when a page-cache is being reformatted. I think the Bot User:Citation_bot has been combining some duplicate footnotes and removing extra cites from articles. However, a smart template (with memory) could review perhaps the prior 4 footnotes (each saved separately), and autotrim a new page for the repeated titles/publisher. As a proposed example:
- 1. ^John Doe (2013). "A Long Chapter Title". Some Book, vol. 3. pp. 56-58. ACME Printing.
- 2. ^Mary Smith (2005). Another Book. p. 234.
- 3. ^Doe, 2013. p. 134.
In the above proposed example, the cite templates would remember 4 prior footnotes (in saved variables), and then detect John Doe's book cited 2 footnotes ago, so just autotrim the next footnote as author/year and page number (as a form of auto-ibidem notation). I have worked with similar 4-prior memory groups before, and when a prior case is matched, then the other 3 prior can retain their memories for all repetitions, until a 5th (unique) cite is detected, to overwrite the memory of the "least recently used" (LRU) case in the memory variables. Anyway, the basic concept is to implement template memory, because remembering data is a part of "smart template" operation. -Wikid77 (talk) 14:06, 21 March 2014 (UTC)
- I like the idea in principle, but if everyone used shortened footnotes then we wouldn't need to consider this. They do an excellent job of condensing references. — Scott • talk 13:28, 22 March 2014 (UTC)
Wikipedia - Suggestions
Dear Sir,
I love Wikipedia. I have learnt a lot and it is a brilliant tool. However for a person who is not an expert in any field, some of the content has become so technical that I have difficulty understanding the content, e.g. pages on quantum physics Suggestion: can we have for example, WikiSimple - Wikipedia pages simplified, that is easier for non-techies to understand please, in everyday language so that perhaps even a child can understand. Perhaps even have a WikiYoung (as opposed to WikiJunior which appears to relate to books only, pity!).
Also, I find that certain pages that one would consider complete at a particular date are constantly being updated. Is it possible to see the history of the changes for that particular discrete page rather than a block of changes for more than one Wikipage.
Just some thoughts.
Regards
Monika — Preceding unsigned comment added by 92.5.229.225 (talk) 09:26, 23 March 2014 (UTC)
- I think Jimbo has been on travel this week, but several of us have noted the increasing complexity of many articles, which add ever-more abstraction of concepts to widen an article for broader coverage of rare cases. Several attempts to simplify wording have been met with hostility over the risk of omitting unusual corner cases of a subject (in n-tuple space!), or perhaps a limit to wp:data hoarding, and now many pages read as total "geekspeak" overrun with technical jargon. Hence, the page "Polygon" must mention the word "polytope" long before "triangle" or "hexagon" or "octagon". Even many sports articles fail to explain the score-board systems, such as RHE (runs/hits/errors) numbers. I still recommend writing the clarified versions as pages on Simple English Wikipedia, where the word "simple" refers to the vocabulary used and does not limit topics to only simple treatment. We also tried to branch into a "Micropaedia" of short, explanatory blurbs about major topics, but that idea was met with numerous objections. Perhaps even harder than writing simple explanations of complex topics, it is a struggle to convey to some people why simplicity even matters. The Micropaedia format would have encouraged thousands of editors to write simple summaries about perhaps 300,000 common topics. -Wikid77 (talk) 13:56, 23 March 2014 (UTC)