Jump to content

Wikipedia talk:STiki: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Other Wikipedias: Response, need translation and a corpus, really
→‎Other Wikipedias: translation info
Line 146: Line 146:


::::: The internationalization has not been formalized. You would just need to take every English string in the STiki source-code and translate it to Portuguese. Most of this would be general translation (and I think calls to the Mediawiki API remain in English) -- but some parts are very specific (for example, comments that indicate where vandalism was undone). This would bring the GUI into Portuguese. The more challenging part is the algorithm, not because of translation, but because it would need a labelled corpus of Portuguese vandalism/non-vandalism over which to learn. What anti-vandal tools currently exist for Portuguese? Does Huggle? Thanks, [[User:West.andrew.g|West.andrew.g]] ([[User talk:West.andrew.g|talk]]) 19:40, 21 March 2012 (UTC)
::::: The internationalization has not been formalized. You would just need to take every English string in the STiki source-code and translate it to Portuguese. Most of this would be general translation (and I think calls to the Mediawiki API remain in English) -- but some parts are very specific (for example, comments that indicate where vandalism was undone). This would bring the GUI into Portuguese. The more challenging part is the algorithm, not because of translation, but because it would need a labelled corpus of Portuguese vandalism/non-vandalism over which to learn. What anti-vandal tools currently exist for Portuguese? Does Huggle? Thanks, [[User:West.andrew.g|West.andrew.g]] ([[User talk:West.andrew.g|talk]]) 19:40, 21 March 2012 (UTC)

::::::West.andrew.g: I've edited the source a bit and sent you an email to the git repository. Most all the gui interface has been externalised and is ready to be translated. I didn't include any links to the specific project in the language file, I thought that could be a separate option (translation of the tool, and version of WP). Also included were a few fixes that were bothering me, mainly persistent user properties and hiding gui options that were greyed-out. Feel free to use any or all of my edits (AKA I'll let you look over everything and release the language file yourself). –[[User:Meiskam|meiskam]]&nbsp;<small>([[User talk:Meiskam|talk]]•[[Special:Contributions/Meiskam|contrib]])</small> 17:49, 22 March 2012 (UTC)

Revision as of 17:49, 22 March 2012




STiki new version on 12/29 (CHANGELOG)

A minor release tonight, mostly affecting the "last revert panel". Per the CHANGELOG:

  • Though not code-specific, a LEADERBOARD now exists with STiki statistics
  • Some changes were made to the "last revert panel":
    • The panel now contains a link to the "article", so one can now easily visit the page which was just affected by an undo/RB.
    • Previously, the "0 edits undone... beaten to revert? ... check page hist" message was displayed anytime an edit attempt did not succeed. Now, there are separate messages for the "beaten to revert" (which is known to be the outcome) and "error" cases. They display in bold-red for prominence.
    • If an "error" edit outcome occurs, and the user has started STiki from the command-prompt/terminal, there should be output about the error. Please post this to the talk-page, so we can discover which error conditions are occuring in practice.
    • A small GUI error was fixed where the panel would report that an edit was successfully made, when in fact, the STiki user had been beaten to the revert. This was a result of the MW-API returning a "success" code, but unknowingly also noting "nochange". This only affected a minority of users without native rollback.
  • Backend: Even when CBNG scores itself poorly; these edits will never be enqueued. i.e., a CBNG revert will never be popped from the CBNG queue.
  • Backend: A script has been included in the [/utilities] directory such that one can de-queue any edit that has 'x+' pass actions, in order to increase the efficiency of other editors and aim for broader coverage. Debating whether to use on the main en-wiki project itself.

Thanks for everyone's continued support. West.andrew.g (talk) 09:04, 29 December 2011 (UTC)[reply]

Reverting multiple edits

A suggestion: In the case where clicking the Vandalism button would revert more than one edit (i.e., the same editor has made several consecutive edits), can the browser window show the cumulative difference for all of them? That is, what I see on the left side of the browser window should be what will the article will look like after reversion. Currently, it shows only the most recent edit. Thanks, Peter Chastain (talk) 11:30, 30 December 2011 (UTC)[reply]

I agree that this would be really helpful. Yaris678 (talk) 19:03, 5 January 2012 (UTC)[reply]
Acknowledged. No promises on the release time-line, though. Thanks, West.andrew.g (talk) 03:00, 6 January 2012 (UTC)[reply]
Yes, this would be a great addition. Thanks for your work, too. Ocaasi t | c 03:45, 16 January 2012 (UTC)[reply]

Whitelist(s)

Hello - I am curious if there are any 'whitelist' settings for STiki; i.e., If a user has more than 95,000 edits (or is a bot-flagged account)? An edit by such a user is most likely not vandalism. Avicennasis @ 03:10, 7 Tevet 5772 / 03:10, 2 January 2012 (UTC)[reply]

Hello. First, we should distinguish that the "STiki" GUI tool is the unification of multiple anti-vandalism algorithms (what we would call "queues"). The "Cluebot-NG" and "WikiTrust" algorithms are largely outside my control. I have independently (with my co-authors) written the "metadata" and "link spam" queues. Each operate in their own unique way.
Generally, however, one feature most of these algorithms integrate is "is the editor registered?" and "how many edits have they made?". On the basis that registered and prolific editors rarely commit vandalism, the algorithms are likely to score their subsequent edits "well" (i.e., unlikely to be vandalism). I'd be surprised to see too many edits by these type of editors appear at the top of any queue (creating a kind of de facto whitelist).
Speaking about the "metadata (STiki)" queue in particular. I can say that bot edits should never appear in the queue, per a hard-coded rule -- and that any editor with that many edits should be quite exempt on the basis of probability. Have you had any experiences to the contrary? As noted above (or perhaps on my talk page) CBNG has tended to score itself quite poorly (but again, outside my sphere of influence). Thanks, West.andrew.g (talk) 07:29, 2 January 2012 (UTC)[reply]
Thanks for the quick response. It seems bot edits are still coming in through STiki - e.g., like this. It's also interesting to note that the message says the change was reverted, however it seems that is also not the case. Avicennasis @ 12:30, 7 Tevet 5772 / 12:30, 2 January 2012 (UTC)[reply]
Using what queue? How often does the latter happen? Thanks, West.andrew.g (talk) 04:32, 3 January 2012 (UTC)[reply]
I have gotten a few today, using the Cluebot-NG queue. I'm pretty sure they were from bots other than CBNG. Peter Chastain (talk) 06:35, 3 January 2012 (UTC)[reply]
I guess the changes discussed at User talk:West.andrew.g#STiki queues edits from approved bots? only related to the STiki queue. That would explain the recent change to filter out CBNGs edits from its own stream. It might be worth expanding that filter to exclude all approved bots. (I guess that false positives of approved bots have not been an issue for CBNG coders because that is filtered out by the whitelist that it uses. Since STiki takes the scores irrespective of the whitelist, it might be worth applying some filtering.) Yaris678 (talk) 20:27, 4 January 2012 (UTC)[reply]
I have argued here and here that the goal of recent changes patrolling (RCP) should be quality control, rather than just the detection and reversion of vandalism. Editors who would never vandalize can still make mistakes, and I always appreciate an extra pair of eyes looking at what I have done. STiki, with its ability to route changes to individual reviewers, could be a great tool for this kind of peer review. If, as I hope will eventually happen, there were enough RC patrollers, it might be possible for most changes (not just the high-risk ones) to be reviewed. STiki's use of multiple queues would allow those who are interested in just vandalism to see edits with a high probability of vandalism, while those of us who want to do quality control could see a broader range of edits, with no explicit whitelisting. If I understand the STiki (metadata) queue correctly, it scores editors by the persistence of their edits and prioritizes new changes accordingly. The only new thing that STiki would need to do would be to look at categories and reviewers' interests (via some sort of profile) so that it could match each article to an appropriate reviewer. Peter Chastain (talk) 06:49, 5 January 2012 (UTC)[reply]
Some interesting thoughts by Peter there. Here my reactions to them:
  1. Expanding STiki to cover more general RCPing sound like a great idea. Perhaps STiki could have a "problematic but not vandalism" button (or maybe two such buttons, one that reverts edits and one that doesn't). This could be used to teach the machine-learning algorithm what such edits look like and so inform a new "problematic but not vandalism" queue.
  2. We could go further and have multiple new buttons and queues: "POV pushing", "Unclear writing" etc. However, such complexity may be more confusing than it is helpful.
  3. Persistence of edits by an editor is actually how the WikiTrust algorithm and queue work. Having a high score on WikiTrust is a good indicator that you are a trustworthy editor. Having a low score probably indicates that you are a newbie, but it doesn't mean you are a vandal, which is why the vandalism density in that queue is usually lower.
  4. Using the WikiTrust score of an editor to an algorithm means the algorithm doesn't need a separate white list. (arguably a new approved bot is an exception... but then again... maybe a few extra eyes on its edits would be a good thing).
  5. The STiki algorithm is based on metadata, which includes the categories that the page is in and the geographic location of IP editors. More info is at Wikipedia:STiki#Metadata scoring and origins
  6. Directing edits towards editors according to areas they know about sounds like it would be really powerful if we had a large user base. It might also take a lot of programming effort (although Andrew would be a better judge of that). The leaderboard says we have 32 editors who have edited in the last 30 days so it might not be at the top of Andrew priority list... but it would be really cool to see. You could make the profiles self-selecting... or you could make them another thing that is learnt by the algorithm. i.e. it could look at the sort of edits you are most and least likely to pass on... probably mostly based on the category that the page is in.
Thanks, Yaris678 (talk) 18:59, 5 January 2012 (UTC)[reply]
Just had a thought: If you want to use STiki for general RCPing now, the best way is to use the WikiTrust queue. It will give you lots of edits by newbie users that have not been looked at already by any STiki user. It’s not quite as good as full problematic-but-not-vandalism feature, but it’s a start.
On a similar note, if Andrew did want to create a special problematic-but-not-vandalism queue, then starting the machine-learning algorithm off by looking mainly at the WikiTrust score of the user is probably a good bet.
Yaris678 (talk) 10:17, 6 January 2012 (UTC)[reply]
Yaris, thanks for your thoughts. From the talk pages, I think you might be the most experienced STiki user here. Looking at your numbered points:
     1. STiki is already useful for general RCPing. Despite the fact that I usually work in the Cluebot queue, I spend most of my time looking at pages, fact-checking, fixing things that look like vandalism but are really poor writing, discussion etc. The question is how to make STiki more usable for that.
     2. STiki is already a bit crowded. I would not like to see the browser window get much smaller. I like the approach of one of the other tools (Huggle???) which pops up a menu of different kinds of problems, each generating its own warning message. I would want one choice to be "none of the above; I will write my own." It is much too easy, and tempting, to "bite the newbies" with STiki's single "Vandalism" button.
     4. I agree, WikiTrust probably makes explicit whitelists unnecessary. As new bots establish their own reputation, their edits will get much less priority. (An exception could be bots like the one that inserts dates when editors forget to put them into a tag: they are never mistaken, but their edits would not persist after the tag has been removed.)
     6. Intelligent routing (routing by subject) is the most important part of what I need. A lot of vandalism is subtle. I like to fact-check, but I am not the best person to look at, e.g., television series, football clubs, etc. Even with 25 regulars, intelligent routing would probably be helpful, but that number should increase. I spend much more of my time RCPing with STiki than I did before I started using it, because my increased productivity with STiki makes me feel that my time is well spent. So, STiki could be an important part of changing the Wikipedia culture toward increasing the number RCPers in particular and quality-control people in general. (I often notice poor writing in the unchanged parts of an edit, and go off to fix that, so STiki is useful there, too.) Getting more people to RCP is essential, for Wikipedia to be useful as a source of information.
I agree that the WikiTrust queue is probably the best source of bad edits that are not necessarily vandalism. On the other hand, I find myself using the Cluebot queue, simply because I like the reinforcement of finding and fixing a lot of problems in a short time.
Thanks, Peter Chastain (talk) 15:52, 6 January 2012 (UTC)[reply]
An additional thought on user messages: It would be nice if there were a way to send them, even when the "innocent" button is pushed. "Please provide an edit summary" is one common example, but others could be "welcome, newbie" or even "kudos!" Sure, this can all be done outside STiki, but the goal here is productivity, and it is nice to have the article title filled in (especially since STiki currently does not allow us to copy into the clipboard from the browser window). Peter Chastain (talk) 16:39, 6 January 2012 (UTC)[reply]

User warnings for unconstructive editing

I'm not clear on why STiki warns some users and not others. 117.203.70.73 received no warning for their edit, but 92.237.169.16 did. Is this intentional? — Preceding unsigned comment added by Wrathkind (talkcontribs) 22:19, 6 January 2012 (UTC)[reply]

Whether or not a warning is issued is entirely up to the editor who is running STiki. For each suspect edit, there is an option to warn or not to warn, and the editor running STiki can choose to not issue warnings for edits which are merely dubious (such as valid but inappropriate commentary as in the example for 117.203.70.73). Johnuniq (talk) 00:27, 7 January 2012 (UTC)[reply]
Well I did have STiki set to warn 117.203.70.73, which is why I was confused as to why no warning showed up. The checkbox for "Warn Offending Editor" is selected by default, and I've not deselected it during my use (and it appropriately warned 92.237.169.16 as expected). So, sometimes it warns; sometimes it doesn't? — Wrathkind (talk) 00:56, 7 January 2012 (UTC)[reply]
Oops, sorry: I missed that you are the person who was running STiki. I have not used STiki much, but when I did I quite often asked it to not warn, and it always (I think) did what I wanted. That is, I never noticed it failing to warn when asked. By the way, I would not warn an editor for this edit. If I had unlimited time, I might offer a friendly suggestion, but not a warning. Johnuniq (talk) 02:07, 7 January 2012 (UTC)[reply]
Yes, I understand. I suppose I should take the time to toggle the warning option per user. I tend to just let STiki do its thing once I notice an inappropriate edit, so I've never bothered to select/deselect the warning option.
As for the intermittent blip in user warnings, it happens occasionally and I couldn't find a pattern to it. Perhaps it's an unintended result of network issues. I was just curious anyway. — Wrathkind (talk) 03:24, 7 January 2012 (UTC)[reply]
After taking your advice to purposefully select the warning toggle with each edit, I discovered the answer to my question. It seems that if the vandalism is a certain number of days old, STiki will abstain from sending out a warning (presumably because it's pointless to do so after a certain amount of time). The message it gives me is along the lines of "Undid 1 edit - no warning given - (edit(s) too old)". That solves my confusion; thanks! — Wrathkind (talk) 04:56, 7 January 2012 (UTC)[reply]
Correct. If an edit is a day or two old (I don't remember the exact threshold) and the editor is an IP address, then even if the "warning" checkbox is checked, a warning will not be issued. This is due to concerns about DHCP (dynamic IP addresses). We don't want collateral damage (i.e., someone receiving a message intended for an earlier user of the computer/IP). Thanks, West.andrew.g (talk) 17:59, 7 January 2012 (UTC)[reply]

Channelling edits to the most appropriate user

A collaboration opportunity?

I was thinking about how we would channel edits to the most appropriate user, if STiki were to go that way. One thing that occurred to me was that it would make sense to collaborate with the makers of SuggestBot. I looked into it, and it turns out that they are GroupLens a research lab in the Department of Computer Science and Engineering at the University of Minnesota.

Andrew, maybe there is an opportunity for a collaboration between Minnesota and Pennsylvania.

Yaris678 (talk) 15:03, 10 January 2012 (UTC)[reply]

Red text in diffbrowser under Linux/Windows

When I run STiki under Linux, I see red text in the diff-browser to indicate what changed between the two versions. But when I run STiki under Windows, I don't get any red text in the diff-browser, making it hard to tell the difference between the versions. Can anyone confirm this? I´m wondering if it is a bug, or if it is just something on my computer. Edit: I'm using windows 7. Arthena(talk) 09:13, 27 January 2012 (UTC)[reply]

I get red text on my machine, which is Windows XP. Yaris678 (talk) 13:37, 27 January 2012 (UTC)[reply]
When Mediawiki updated a while back, the devs changed some style-sheet classes about how this was represented, breaking my parser. I pushed an update of STiki to address the issue. Might your STiki version on Linux be newer than the Windows one? If so, see if an update fixes the issue. If not, let me know, I have some Windows machines I can play around on. Thanks, West.andrew.g (talk) 02:20, 28 January 2012 (UTC)[reply]
Ok, I updated to the latest version and it fixed it :). Both STiki_2011_08_01 and STiki_2012_01_17 are "version 2.0", so I thought I already had the latest version, but I didn't. Thanks. Arthena(talk) 16:13, 28 January 2012 (UTC)[reply]
Yeah, its not the most elegant system, but the big version number, i.e., "2.0" is hiding in a lot of tricky places that are not easily updated. Thus, when minor bug fixes are pushed, the versions are just denoted by the build date. Thanks, West.andrew.g (talk) 17:19, 28 January 2012 (UTC)[reply]

Flexibility

I use STiki regularly and I must say I find it extremely good and useful. However, I face issues when it comes to warnings. Let’s say if an editor is involved in ((subst:uw-unsourced1)) or any other edit that requires warning other than ((subst:uw-vandalism1)), then for issuing the appropriate warning, we have to go outside the tool and manually issue the warning (unless I did not understand the functionality). Due to this either a user has to falsely mark “Pass” or “Innocent” to the article OR identify the edit as “Vandalism” OR revert edit manually. It might be time consuming and STiki user might look for other available alternate tools to facilitate this reverts. Another suggestion is that user should be given an option to either load the edits by time (oldest to newest but NOT other way around else everyone will only review the most recent edits) or random. Can someone please consider this? Many thanks and once again, great job in developing this tool. Cheers AKS 18:44, 4 February 2012 (UTC)[reply]

My suggestion is that instead of just 3 buttons 'Vandalism(undo)', 'Pass', 'Innocent', There can be additional buttons like 'Need References(Tag without undo)', 'Need Reference(Undo)', 'Spam(Tag without undo)', 'Spam(undo)'. Also is it possible to put a button called 'Read Article'. Clicking the button will automatically open the article in the default web browser. This would be very useful if the user is not able to decide without knowing the context. It is because of these problems I mistook that user AKS was wrongly reverting articles. --Anbu121 (talk me) 14:49, 5 February 2012 (UTC)[reply]
I'll address several of these points in greater depth later. In the immediate, I'll note that in the "metadata panel" next to the article name there is a link which leads directly to the article in question. This is the equivalent of your "read article" button request.
For Example, Please see this. The diff seems like that it is test/vandalism, but if you read the article, you will find that it is a perfect edit with appropriate sourcing.--Anbu121 (talk me) 14:59, 5 February 2012 (UTC)[reply]
Yes, there is a universal need to understand the context of an edit, regardless of how you want to tag/revert it. This is why there are humans in the pipeline. If it were truly straightforward, algorithms would revert ALL vandalism autonomously, instead of around ~40%. Thanks, West.andrew.g (talk) 16:01, 5 February 2012 (UTC)[reply]

Following redirects

Is it possible to have STiki follow redirects, like Twinkle does, when warning users? Here, a message was left on a redirected talk page that probably should have been on the target of the redirect. Logan Talk Contributions 22:14, 12 February 2012 (UTC)[reply]

This can be done, no issue (placed on my TODO list). But out of personal curiosity, I'd also like to know: (1) why, and (2) how prevalent this is? I can understand why a seasoned editor might change user-names and want to redirect their talk page. Why would a vandal (or vandal to be) get involved in such low-level username stuff? Thanks, West.andrew.g (talk) 16:58, 13 February 2012 (UTC)[reply]

Multiple warnings

What made STiki do this? SD5 23:50, 14 March 2012 (UTC)[reply]

Took me a second, but I figured it out. Essentially it is because there are two sections named "March 2012". When Wikipedia scans a talk-page it extracts only the section corresponding to the current month, i.e., "March 2012" (if it does not exist, it will simply append a new section). Then it scans that section for the highest warning level issued. The problem here is that when I tell to get the section named "March 2012" it gets only the first one, where a warn-level 1 has been issued, and decides a warn-level 2 is appropriate (multiple times). It appends this warning to the "March 2012" (which it seems the append action interprets as the "last" section with that name). Moral of the story: don't have sections with the same name. I will try to think of a way to hack around this, but it ultimately comes down to CBNG duplicating section headers. It has had issues with this in the past, and in fact, it screws up its own warning system incrementation when it does this. Thanks, West.andrew.g (talk) 03:01, 15 March 2012 (UTC)[reply]
Many thanks for taking the time to write that explanation. SD5 03:10, 15 March 2012 (UTC)[reply]
Perhaps it would be possible for STiki to remove duplicate header sections before deciding what warn-level to give the editor. After using Twinkle, I sometimes find myself, upon reading a talk-page, having to manually remove the duplicate header myself. I'm seriously considering switching to STiki after reading this article and discussion. — Glenn L (talk) 06:47, 15 March 2012 (UTC)[reply]
When you say remove, I assume you mean merge. That would be preferable. That would be a cool feature if STiki did that. It may seem that the ideal solution is for CBNG to sort out their own code. However:
  1. The CBNG team haven't fixed it yet, despite being alerted to the problem a while ago.
  2. If STiki does the fix then we know that STiki isn't going to get confused by this issue, which could (in theory) arise for other reasons.
BTW Glenn, Yes... you should switch to STiki! As a mathematician, I'm sure you will appreciate it's use of alternating decision tree methods.
Yaris678 (talk) 17:52, 15 March 2012 (UTC)[reply]

Other Wikipedias

Are there any plans to adapt STiki into other Wikipedias? Can I help adapt it to the Portuguese Wikipedia? Chico Venancio (talk) 16:40, 17 March 2012 (UTC)[reply]

It is not an immediate priority of mine. It would not be terribly difficult to do, though, mostly translation would be involved. Do you have any programming experience? The question becomes where it would be hosted (it needs to always be running, and have a database to store data). Thanks, West.andrew.g (talk) 21:18, 17 March 2012 (UTC)[reply]
I have some programming experience. I can program in Python (see my bot at ptwiki), and do understand a bit of Java. I don't know where to host it tough. Where is it hosted now? Can toolserver be used? Chico Venancio (talk) 01:52, 18 March 2012 (UTC)[reply]
The current version is hosted on a machine at the University of Pennsylvania. However, my forthcoming dissertation makes that an unreliable place for any new project to get started. The toolserver is a good idea, and I should probably investigate migrating the en.wiki version there sometime soon. Thanks, West.andrew.g (talk) 14:23, 18 March 2012 (UTC)[reply]
So, what do you need translating? Chico Venancio (talk) 16:43, 19 March 2012 (UTC)[reply]
The internationalization has not been formalized. You would just need to take every English string in the STiki source-code and translate it to Portuguese. Most of this would be general translation (and I think calls to the Mediawiki API remain in English) -- but some parts are very specific (for example, comments that indicate where vandalism was undone). This would bring the GUI into Portuguese. The more challenging part is the algorithm, not because of translation, but because it would need a labelled corpus of Portuguese vandalism/non-vandalism over which to learn. What anti-vandal tools currently exist for Portuguese? Does Huggle? Thanks, West.andrew.g (talk) 19:40, 21 March 2012 (UTC)[reply]
West.andrew.g: I've edited the source a bit and sent you an email to the git repository. Most all the gui interface has been externalised and is ready to be translated. I didn't include any links to the specific project in the language file, I thought that could be a separate option (translation of the tool, and version of WP). Also included were a few fixes that were bothering me, mainly persistent user properties and hiding gui options that were greyed-out. Feel free to use any or all of my edits (AKA I'll let you look over everything and release the language file yourself). –meiskam (talkcontrib) 17:49, 22 March 2012 (UTC)[reply]