This user is a member of the Bot Approvals Group.
This user is a member of the Wikimedia volunteer response team.
This user has administrator privileges on the English Wikipedia.

User talk:The Earwig

From Wikipedia, the free encyclopedia
Jump to: navigation, search


TfD closures[edit]

Hi, Earwig. I just wanted to stop by and thank you for helping out with closures of pending TfD discussions. In an ideal world, we would have a pool of 5 to 10 administrators who closed TfDs on a regular basis so that we could spread the burden around. If you have any admin buddies who might be interested in closing 5 to 10 TfDs per week, please let me know, and I will be glad to help recruit them to the cause. Dirtlawyer1 (talk) 04:23, 12 August 2015 (UTC)

Thanks for that, glad to help. Sadly, most of my admin friends are no longer active. I think a more useful version of User:Doug/closetfd.js would be very nice to have, since I find myself doing the same sort of work repeatedly in simpler cases (checking/removing navbox transclusions, deleting talkpages/redirects/subpages, etc) – and I was surprised when I started that Twinkle has no template-delinking function like it has for page links. Perhaps such a tool would encourage more admins to help out? I would work on it myself, but lack of time is an issue. — Earwig talk 04:36, 12 August 2015 (UTC)
Hey, Earwig, I suggest you mention your issues to administrator User:Opabinia regalis (another recent recruit to TfD closings), and to template editor User:Alakzi. Alakzi is one of our star coders, and no doubt will have some insights into how the "Doug" TfD closing script might be improved. Cheers. Dirtlawyer1 (talk) 04:46, 12 August 2015 (UTC)
Last I heard, Alakzi would "look into" it, which was about a week ago. Well, we'll see. I'm willing to give it a shot in a month or so if nothing happens in the meantime. — Earwig talk 04:59, 12 August 2015 (UTC)
Yes, his services are popular now. I liked it better when I had him mostly to myself, but I'm told that template editor envy is not flattering on me. Dirtlawyer1 (talk) 05:10, 12 August 2015 (UTC)
Ha, I'll look into cloning ;)
Agreed, all Doug's script does is stop you from making a typo adding the templates. Automating the tedium would be useful. I've been vaguely thinking of adapting the closeafd.js script, but haven't had the right combination of time and motivation (and it'd have to be a rare combination to make me think 'yes, javascript sounds like fun today'), so I'd be quite pleased if someone else did it! Opabinia regalis (talk) 07:51, 12 August 2015 (UTC)
I, too, happen to be quite averse to JavaScript, so if somebody else were to take the initiative... Alakzi (talk) 08:05, 12 August 2015 (UTC)
Oy, I guess I'll start working on that, then. @Alakzi and Opabinia regalis: can we come up with a concrete list of improvements? — Earwig talk 08:16, 12 August 2015 (UTC)
Offer to choose between "keep", "delete", "merge", and "do not merge"; automatically mark as a NAC for non-admins; remove {{Tfd}} and Tfm from the template when kept or not merged, else replace with {{Being deleted}}, enclosed in <noinclude>...</noinclude>; and tag with {{Tfd end}} on the talk page. Alakzi (talk) 08:25, 12 August 2015 (UTC)
Right. Would a mass-orphan or substitution function (invoked separately) be a good idea or too risky? It would only handle clear cases. Also, the option to move to the holding cell would be useful. Hmm, I have an idea how this could work. — Earwig talk 08:40, 12 August 2015 (UTC)
OK, but what would make a clear case? The HC option is a good idea. Alakzi (talk) 09:03, 12 August 2015 (UTC)
I'm not sure how to strictly define it yet – will see when I get to that point, since the other stuff will come first. Also, a relisting option would be simple and useful. That should be enough to start working on it. — Earwig talk 17:37, 12 August 2015 (UTC)
Thanks for taking this on, Earwig :) In addition to Alakzi's list, checking for redirects and subpages would be useful. Oh, when the page reloaded after my edit I saw you already said that above. Never mind me, I need more coffee. Opabinia regalis (talk) 19:27, 12 August 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Just an update – progress has been slow but steady due to various real-life things. The main interface and much of the functionality is complete. I hope to finish the first version before next week. — Earwig talk 07:43, 18 August 2015 (UTC)

Another update: alas, I'm not able to get back to working on the script yet, given the start of classes and a few other RL things. Tentatively suggesting a workable version by the middle of next week. — Earwig talk 00:11, 25 August 2015 (UTC)
Another bump. Some good progress, but a general lack of time to work on it. Will continue to keep people posted. — Earwig talk 02:29, 16 September 2015 (UTC)


The Signpost: 28 May 2016[edit]

Yandex.XML[edit]

Hi Ben, we are getting an error message with the copy vio detection tool. "An error occurred while using the search engine (Yandex Error: ). Try reloading the page". I wonder if you could look into this when you have a minute? Thanks, — Diannaa (talk) 17:31, 3 June 2016 (UTC)

Hi, I can confirm this error. I tried several pages in cswiki, every check failed with "An error occurred while using the search engine (Yandex XML parse error: Opening and ending tag mismatch: meta line 39 and head, line 70, column 8). Try reloading the page. If the error persists, repeat the check without using the search engine.". Can you fix it soon please? Thanks. --Martin Urbanec (talk) 19:33, 3 June 2016 (UTC)

I don't have any ideas without looking into it. But we're about to switch to Google, so let's just wait for that... — Earwig talk 21:19, 3 June 2016 (UTC)
Actually, I figured it out. Fixed now. — Earwig talk 21:53, 3 June 2016 (UTC)

The Signpost: 05 June 2016[edit]

bot does not update measuring (I tried to "reload" or to repeat action - failed)[edit]

Hello, it is about https://tools.wmflabs.org/copyvios/?lang=en&project=wikipedia&title=&oldid=724320691&use_engine=0&use_engine=1&use_links=0&use_links=1&turnitin=0&action=compare&url=http%3A%2F%2Fwww.oysters.ru%2Fen%2Fabout_us%2F which has worked initaially fine and detected properly copy-cut, but has not updated (cache?) after it was removed, so it constantly shows previous state and has not shown actual. Very nice and usefool tool. I have noticed you as you requested in header of that site. Related to article Oysters LCC. Thx in advance, your replay is not necessery, that is only and just info to pay your attention to eventual problem in bot behaviour or not clear description hot to ask bot to re-examine article in its the list (not updating on user, logged in wikipedia one, request) Ocexyz (talk) 08:17, 9 June 2016 (UTC)

Hi Ocexyz; since you are comparing the URL against a fixed revision (724320691), it won't update when you make edits to the page. You can see the live comparison by using a page title instead of a revision ID, as in this example. — Earwig talk 08:25, 9 June 2016 (UTC)
Hi again, I've finally figured that out, it works fine. Anyway it is not quite clear at first use.
Proposal: (1) to add in description something like "Note: to compare/check with the latest version of wikipedia article mentioned in its history of edits just put Wikipedia article name without [ [ and ] ] into field ... and leave "revision ID" field empty, then press submit. (2) add in the UI a button [check with latest site version] just doing the same.
Bot engine is OK, a bit more stupid-proof UI against users in rush, like me, could help. Thx in advance, and thx for your good work - appreciated very much :) Ocexyz (talk) 08:51, 9 June 2016 (UTC)

A barnstar for you![edit]

Vitruvian Barnstar Hires.png The Technical Barnstar
Your toll is brilliant - respect for great excellence and usefulness. thx from and in the name of other Wikipedia contributors :-) Ocexyz (talk) 08:55, 9 June 2016 (UTC)

The Signpost: 15 June 2016[edit]

Copyvio detector[edit]

So I found Rapport congruency on http://www.gutenberg.us/articles/rapport_congruency (I need to show the link to explain), but it's apparently excluded as a copyvio source as a result of gutenberg.org being excluded. gutenberg.us is the Gutenberg self-pub, and is under a CC-BY-SA 3.0, which requires attribution. Now, the page does not say it mirrors WP, so I'd guess it doesn't, and it was copied to here, and thus I think the copyvio tester shouldn't be excluding that domain. Am I correct? MSJapan (talk) 06:15, 17 June 2016 (UTC)

Hi, MSJapan. Looking at the source, assuming I understand you right, it says "Help to improve this article, make contributions at the Citational Source" with a link back to the Wikipedia article's history. It definitely looks like a mirror to me. — Earwig talk 06:21, 17 June 2016 (UTC)
Ah, I see. I thought that was just the color of the link, but it's because I had followed it already. Is it supposed ot point at the history page, though, to be correct? Anyhow, thanks! MSJapan (talk) 06:33, 17 June 2016 (UTC)

Rich Farmbrough arbitration amendment request[edit]

The Arbitration Committee respectfully requests your attention, as an active member of the Bot Approvals Group, at this arbitration amendment request, which seeks to remove bot-related restrictions from Rich Farmbrough. Any comments would be appreciated. For the Arbitration Committee, Kevin (aka L235 · t · c) 18:53, 22 June 2016 (UTC)

Please also see WT:BAG. Thank you, — xaosflux Talk 21:57, 22 June 2016 (UTC)

Deleted template[edit]

Hi there. I was reviewing some old templates I created and noticed you deleted {{Board games}} based on this discussion. I don't think you properly reviewed the situation, as an IP editor changed the original version of the template to this before nominating it for deletion. Moreover, I was never notified of this nomination. The consensus was to delete, but that was based on the template after it was significantly reduced. Can you quickly inspect this and let me know what you think? Mindmatrix 18:06, 24 June 2016 (UTC)

@Mindmatrix: Hmm. I don't have any strong opinions here; it was a while ago and I don't remember my exact reasoning, but a navbox of categories seems tricky (especially given recent discussions over bidirectionality). Is there much precedent for that? Either way, you're welcome to DRV it. — Earwig talk 19:28, 24 June 2016 (UTC)

The Signpost: 04 July 2016[edit]

EarwigBot and Facebook[edit]

Hi EW, I don't know if this is a bot-specific thing, or an artifact of the new search engine, but it is no longer picking up Facebook copyvios. When comparing the article with FB content, the FB side of the window gives the below.

Extended content

Facebook logo Email or Phone Password Forgot account? Sign Up Security Check

Please enter the text below

Can't read the text above?

Try another text

Text in the box:

What's this? Security Check

This is a standard security test that we use to prevent spammers from creating fake accounts and spamming users.

Submit English (US) Español Français (France) 中文(简体) العربية Português (Brasil) Italiano 한국어 Deutsch हिन्दी 日本語 Sign Up Log In Messenger Facebook Lite Mobile Find Friends Badges People Pages Places Games Locations Celebrities Groups About Create Ad Create Page Developers Careers Privacy Cookies Ad Choices Terms Help Settings Activity Log Facebook © 2016

No login was needed to get to the page in question, so the signup page should have been unneccessary, unless FB is just detecting excessive google traffic. The exact comparison throwing this is: [1] (source article tagged for Copyvio, as it is a match to the target page, so you may need to use adminny sorcery to see that content).

Thanks, CrowCaw 18:23, 9 July 2016 (UTC)

Hi Crow. Not related to the search engine change, since it's Labs's own computers who are talking to Facebook in this case, not Google's. Unfortunately I don't think there's much I can do about this. It seems Facebook finds the tool's IP suspicious, so we get that message even though Facebook gives a valid response if I try to do the comparison locally. A possible solution would be proxying the requests through another server, but I'd need to set that up, and I suspect Facebook might just blacklist us again. The good news is that these sorts of blacklistings aren't usually permanent, so it might work in the future. — Earwig talk 22:46, 10 July 2016 (UTC)

Meaning of copyvio detector result[edit]

Apologies if this is already explained somewhere, but if it is I can't find it... What is the meaning of the % result of your copyvio detector? EEng 20:26, 17 July 2016 (UTC)

Hi EEng. In simple terms, it's just a measure of how much content is shared between the article and suspected source. The math is a bit complicated, but higher percentages mean that a larger amount of the text is the same (either absolutely or relative to the size of the article). To be clear, it's not a direct proportion, so 50% doesn't mean half the article is copied. Usually anything below 50% is likely to be a false positive, or just a couple sentences in common. — Earwig talk 21:12, 17 July 2016 (UTC)
I've got a degree in applied math, computer science, and statistics, so I'd appreciate a not-simple explanation. What, exactly, is the % a proportion of? How can this possibly be interpreted as a "confidence" of a copyvio? EEng 21:22, 17 July 2016 (UTC)
EEng: Sure. Before going into it, I want to be clear that the goal isn't to assess the likelihood of a copyvio at a high level (e.g. with concern for the copyright license of a source, presence of quotes, or whether the content in question surpasses some creative threshold). Calling it a measurement of plagiarism likelihood may be more accurate. Effectively, we're interested in how similar two sequences of text are, where similarity is defined as the shared occurrence of substrings of text.
In the tool's current form, it looks at the number of word-level 5-grams that are common in both data sets. First, we strip the article and suspected source of markup and punctuation, and then build a set of the 5-grams found in the article (a) and the source (s). Then we examine the intersection of these sets, which I'll call Δ. We define the size of an n-gram set to be the sum of the original-text occurrences of each n-gram within the set, and in this case the size of Δ depends on the minimum of the number of occurrences of each common n-gram in the two original texts. So, we now have |a|, |s|, and |Δ|.
If the ratio of |Δ| to |a| is close to 1, then that's an strong indication that copying is likely, because it means a large percentage of the phrases in the article are also found in the suspected source. The simplest version of the copyvio detector would return this ratio as its confidence value, but things are more complicated because we want to catch very large articles that only copy a single paragraph, which leads to a small ratio but an (absolutely) large |Δ|. So, we have a function that looks something like |Δ|/(|Δ| + 100); i.e. we have 0.5 confidence of a copyvio when 100 5-grams are in common, and confidence asymptotically approaches 1 as the number of 5-grams in common increases. The final confidence percentage is then the larger of these two values.
In practice, things are fudged a bit. We don't use the raw ratio as the first value, but something more sigmoid, which lowers the confidence for articles that have only a few phrases in common (false positives) and raises it for articles that have many phrases in common—if half the article is copied, the confidence is closer to 75%. The actual numbers were experimentally derived and seem to work well in my experience. Ideally, I would train this on a large corpus of known copyvios and their source texts, but such a set is hard to procure. So this admittedly rather wonky method is what we use. — Earwig talk 22:59, 17 July 2016 (UTC)
OK, good, thanks. Now while I digest that, can you tell me what the "source" is in what they seem to be doing at DYK i.e. as seen here Template:Did_you_know_nominations/Jane_Hamilton_Hall? EEng 23:48, 17 July 2016 (UTC)
Ah, I see the problem here. They're calling this the "probability" of a violation, but that's misleading; as I explained it's a fairly arbitrary measurement of how much content is in common. I'll let the bot operator know. Anyway, to answer your question, in this case the tool breaks the article up into sentences and sticks several of them into Google, and runs the comparison I described above against each result Google returns. Whichever URL yields the highest confidence value is considered the source. — Earwig talk 23:55, 17 July 2016 (UTC)
Wikipedia:Bots/Requests for approval/DYKReviewBot#Copyvio language misleading. — Earwig talk 00:28, 18 July 2016 (UTC)
  • Sounds like you see why I came here. I'll chew over what you wrote above over the next few days, between luxury dinners, massages, and laps in the pool. (I'm on vacation, you see.) EEng 00:50, 18 July 2016 (UTC)

The Signpost: 21 July 2016[edit]