From Wikipedia, the free encyclopedia
Jump to navigation Jump to search


I'm Trey and my current project—as of the Wikimania 2017 Hackathon—is to find and fix mixed-script words on various wikis (across languages and projects). Sometimes—due to keyboard-swapping mishaps, unknowing cut-n-paste, or outright vandalism—there are words with letters in more than one script in articles on various wikis. While "Trey" and "Trеy" may look the same, the second has a Cyrillic е in it. Searching for one will not find the other.

I have a user script I've built to find and identify likely candidates, show context, and make edits. It's often easy to see that mistakes have been made. The most common Latin-with-Cyrillic error is some variant of Мoscow/Мoskau/Мoscou/Мoskva/Мoskwa—with a Cyrillic М—in references. I only make such changes in the content namespace. I worry about being labelled a meatbot.

I currently work for the WMF, but all my mixed-script edits (including a few I made on my WMF account) and all other edits I make with this account are made in my capacity as a volunteer.

Some Poor Documentation[edit]

I've shared my homoglyph script, and I've installed it. If you install it, it shows up under Tools as "Homoglyph Hunter". It is still very rough; it doesn't handle all errors well, and it doesn't show all results (there seem to be some in templates, for example, which I haven't tried to handle). I have only tried it in Chrome, and I always run it with the Developer Console open to watch for errors it doesn't catch. Not all suggestions it makes are good ones, so please look carefully before clicking "FIX". Things can be especially complicated with an article page has a homoglyph in the title. It completely fails to work on some wikis I've tried it on, and I haven't looked into why. Use at your own risk!