Wikipedia talk:Bots/Requests for approval

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Shortcuts:
Alternatively, you can talk at #wikipedia-BAG connect.

Request for re-examination Wikipedia:Bots/Requests for approval/Theo's Little Bot, Task 1[edit]

How does one judge the "low resolution" requirement of our Wikipedia:Non-free content policy? Here are some things I consider - the size of the source image, the detail in the source image, the quality of the scan and encoding. What things does Theo's Little Bot consider? Only whether it's greater than 0.1 megapixels, an arbitrary number suggested by Wikipedia:Non-free_content#Image_resolution. And if it is greater, and it's in Category:Wikipedia non-free file size reduction requests, the answer is to reduce it to 0.1 megapixels regardless of its content. This degradation in quality is a one way process, as previous versions are deleted, users cannot revert, information is lost.

A look through the bot's uploads shows examples such as -

What does Template:Non-free reduce mean? It used to mean the current file is too large. Now it means, the current file needs to be 0.1 megapixels. For many editors, it still means the former. Editors will not consider whether the file should be 0.1 megapixels, only that it is too large - this is the intuitive expectation, regardless of bot implementation. Right now, there is no way to tag images for reduction and for that resulting image to be larger than 0.1 megapixels. Even when an image is unambiguously high definition, I'm told File:Dear Esther Screenshot Large.jpg was at 1080p, tagging it would be a mistake - because you're left with a resulting file so small as to fail its rationale.

I propose we stop TLB's task 1 from running, and only restart it when we have a better process and implementation in place to stop its pointless overuse and resulting errors shown above. - hahnchen 21:48, 15 February 2015 (UTC)

The bot is working fine, but editors need to recognize the reduce tag should only be for images far outside the range. That's a behavior problem, not a bot problem. --MASEM (t) 22:45, 15 February 2015 (UTC)
Read second to last paragraph. Images can be correctly tagged, and still have negative results. You could place an onus on the revision-deleting administrator to review the image, but that is not done - of the files I checked, not once was a downscale reverted. It's clear that either process is ignored, or has never been put in place. - hahnchen 23:02, 15 February 2015 (UTC)
If the rationale has no explaination of why a higher-than-normal resolution image is provided (and in the range of ~0.2 MP or higher), it is completely appropriate to tag that with a non-free reduce template and let the bot handle it. It is up to the image uploader or those using the image to make sure that if a larger image is needed, to justify it in the rationale. No rationale for the larger image, then it needs to be resized. --MASEM (t) 23:36, 15 February 2015 (UTC)
The tag is correct, it was a file that needed reduction. Needing reduction and "needing reduction to 0.1 megapixels" are different things. - hahnchen 23:41, 15 February 2015 (UTC)
The rationale does not give any reason why the image needs to be larger than 0.1MP, so someone reviewing images and seeing a full 1080p image has no reason to question the need to reduce. --MASEM (t) 00:24, 16 February 2015 (UTC)
Needing reduction and "needing reduction to 0.1 megapixels" are different things. Admins are not reviewing the resized images. Hence the example above of the reduction of File:Slacker-logo-black-official-2015.png which in actuality is a free-use file. The resized File:Dear Esther Screenshot Large.jpg is unsuited to show the game's graphics as specified by the rationale and article image caption. Those are absolutely the reasons why the file has to be larger than 0.1MP, rationales have never explicitly stated why images are larger than the 0.1MP suggestion. File:Warlugulong - Zoom.png is used in featured article Warlugulong, it is nearer 0.2MP, the rationale is suitable and justifies the actual size of the upload, not an imaginary number. I considered all the questions I posed right at the top of this section when I uploaded the image, but I have no confidence that the tagger, bot, or admin would do the same. - hahnchen 11:49, 16 February 2015 (UTC)
The Esther image has a poor rationale to justify "the graphics to the game are important", particularly when the image started at 1080p resolution. ("To show off the beautiful graphics" is not a proper rational - in contrast to the Warlungulong image where the rational explains that the detail is discussed by sources in the article.) You're blaming the bot for something it is only told to do by editors well aware of NFC policy's minimal use requirement. --MASEM (t) 15:55, 16 February 2015 (UTC)
Are you confident that if File:Warlugulong - Zoom.png is tagged, it will get reverted? Do you believe had I not stepped in, that File:Starcommand2013battle.jpg would still be saved as a useful image? That too was reduced to 0.1MP. If not, we do not have a suitable process in place for the operation of this bot, and we should stop this task until we do. - hahnchen 16:43, 16 February 2015 (UTC)

@MBisanz: -- Magioladitis (talk) 22:50, 15 February 2015 (UTC)

It might be worth giving the template a size parameter or two. {{size=.1M}} {{size=100K}} {{width=256}} etc...

Also maybe it would make sense for the bot to skip items which would be reduced by less than 50%. All the best: Rich Farmbrough18:26, 17 February 2015 (UTC).

Note that the erroneously tagged as non-free file File:All Men Are Mortal, 1946 French edition.jpg has been reduced and the previous versions have been deleted. Admins do not review the downsized files, and how can they, given the amount of pointless reduces that are forced through the system? - hahnchen 20:31, 20 February 2015 (UTC)
Again, the bot cannot understand images outside of their digital nature. It can't figure out free vs non-free, and this looks like a case that the uploader opted to select non-free as unsure of the best solution. Thus, the person that put the tag on that without questioning if it should be free is the behavior to discourage, not the bot. --MASEM (t) 20:36, 20 February 2015 (UTC)
Masem, everyone reading this page already knows how the bot works. We can use the guillotine to trim hair too, but you can't defend accidental decapitations with "all a guillotine does is cut stuff". In this case, the bot and the process surrounding it compounded an error. There is no oversight at any point, the tagger, bot and admin do not review the image or its use. Until we have that process in place, we should not be operating the bot. - hahnchen 11:04, 21 February 2015 (UTC)

  • Proposal - The bot should resize images to 0.15 megapixels and ignore smaller images.
    • Rich suggests ignoring images larger than 0.2MP[1]. Masem suggests that the tag should only be used for images "far outside the range"[2]. Instead of reducing images to 0.1MP and ignoring close cases, it should just reduce to 0.15MP. Bots cannot make any of the value judgements that editors can. The bot should be conservative, if the image needs resizing below 0.15MP, then that should be a manual process, where the right questions can be asked. The bot should place images below 0.15MP into Category:Wikipedia non-free file size reduction requests for manual processing and notify the tagger. I'd be fine with 0.2MP too, or reducing to 0.15MP only those files which are larger than 0.2MP. - hahnchen 11:04, 21 February 2015 (UTC)
  • Proposal - Uploaders must be notified when Template:Non-free reduce is tagged to their image.
    • While savvy users may watchlist their images like User:Darkwarriorblake at File:Dredd2012Poster.jpg, many reductions take place with editors oblivious until it's too late. This could be automated, or you could place the onus onto the taggers, I would prefer the former. - hahnchen 11:04, 21 February 2015 (UTC)
  • Proposal - Once tagged, the bot does not reduce images for a week.
    • This gives time for manual reduction or tag removal by the uploader. - hahnchen 11:04, 21 February 2015 (UTC)
  • Proposal - Bots (and taggers) should ignore previously reduced files.
    • File:Thor-272.jpg was a completely ridiculous reduction where Theo bot fights Dash bot. The Dashbot reduced version was 325*502, which is 0.16MP. (I think this is because Dashbot looked at image width as well, conservatively allowing for larger thumbnail preferences) The original was clearly low resolution, any further reduction is needless. Can the bot parse file history comments? If so, if the previous version was already marked reduced, it should place the file into Category:Wikipedia non-free file size reduction requests for manual processing and notify the tagger. - hahnchen 11:04, 21 February 2015 (UTC)
  • Proposal - Admins to actually review the resized files.
    • This should be happening already. The proposals above, if implemented, should result in a reduction of the pointless resizes and make reviewing files more manageable. - hahnchen 11:04, 21 February 2015 (UTC)

Request for re-examination Wikipedia:Bots/Requests_for_approval/StanfordLinkBot[edit]

I see a serious lapse from my side. I missed to respond to comments by Josh Parris, Maralia and would like to respond to them. I was pleasantly surprised to look at the level of detail in which the edits were analysed by Josh. Although I am not an experienced wikipedia editor with thousands of edits under my belts, until such time, I would still like to improve Wikipedia using the best of my talents. Major concerns were with the exact process of link addition rather than the links themselves. I believe this is something which can be fixed.

The way we generate links is that the links in a source page are suggested on the basis of people using that page as a waypoint towards the target pages, across many instances. Thus a link at that place, would have helped the people reach the target page faster.

One error with this approach is that humans do not automatically consider the wiki linking guidelines while using it. However this issue can be mitigated by filtering links which refer to countries and originate from Category:Featured articles. We can work further on blacklisting certain types of links based on inputs from experienced editors.

Another error with this approach, is that finding the correct anchor text for the link itself can be tricky. The approach we use here is that we look at all possible anchors (words where a link can be inserted) for the target page across Wikipedia, filter such that anchors are not ambiguous (that is the probability of the anchor linking to other pages is small), and then we look for the first occurrence of any unambiguous anchor, since the link should be introduced as early as possible in the page. However we exclude section headings as per linking guidelines. The trade off here is between the least ambiguous link and its location in the page. We are working on a better weighting formula for combining these two factors instead of simply filtering by ambiguity and finding the first occurrence from the remaining. However such a formula is harder to justify.

This explains why "European" was included in the link text to classical music in Cello. The reason is that European classical music redirects to classical music across Wikipedia, which makes it an unambiguous anchor. And it is the first unambiguous anchor to appear in the article.

The second issue with this is that, although it can do multiple edits in the same page, it does not link pages which were not on the paths taken by people. So, the edit to Aromatic hydrocarbon linked to carbon dioxide, but not carbon monoxide, because we had data about carbon dioxide and not carbon monoxide. The common thread arising here was similarly positioned terms in the sentence and although we don't have supporting data, we are working on ways to identify such links and introduce them as well.

Another concern was regarding skipping of possible anchors. For example the bot chose to bypass Harvard Graduate School of Education and instead pipe link Harvard to Harvard University. However the manual of style for linking states that links should not be placed in the boldface reiteration of the title in the opening sentence of a lead which was specifically hard coded into the rules inside the bot.

Yet another concern is the context in which the anchor appears. For example, we wanted to link Raphael to Renaissance. The first occurrence of renaissance was preceded with the word high. We shall take care to exclude the anchors where in conjunction with preceding or succeeding words they can link to other pages.

However it is hard to understand the context of the sentence for example the fact that soybean edit links to Dairy, which is actually a dairy products factory instead of Dairy product. Or in sugar, where Protein was linked, instead of Protein (nutrient). This is harder than all the other issues and we have not been able to find a befitting solution for it.

Given that I am inexperienced in editing, I would like to work together with expert editors to augment the bot with AI rules, such that most of the edits are not shoddy and the utility is maximized. Ashwinpp (talk) 17:49, 15 March 2015 (UTC)

I think this is an area that you need strong AI for, and we just don't have that yet. All the mechanical questions are easy enough to sort out, but as you say the problems with Dairy and Protein are difficult. In fact, they're so difficult I'm willing to bet that normal humans wouldn't notice/appreciate the problem - you need experienced editors. I think that was demonstrated with your attempt to validate your data using the Mechanical Turk.
As a guide for next steps, you could go down the semi-automated editing route that User:Dispenser went down in solving disambiguation: http://dispenser.homenet.org/~dispenser/view/Dab_solver Josh Parris 21:59, 20 March 2015 (UTC)
So to rephrase and ensure that I'm getting the correct idea. First we need to distinguish between easy v/s difficult links. The easy edits correspond to the mechanical questions and the difficult links correspond to the ambiguous links. It is possible that majority of the links have some ambiguity and fall under the difficult category. This distinction can be made on the basis of the output of dab_solver.py and and a threshold on the relevancy scores. The difficult links shall then be inserted with the help of experienced editors. Ashwinpp (talk) 04:13, 24 March 2015 (UTC)