Jump to content

Wikipedia:AutoWikiBrowser/Tasks

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Coreyemotela (talk | contribs) at 11:20, 9 June 2014 (→‎Portals). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This page is for AWB'ers who can't find anything to put it to use on, or who would like to help others with this powerful tool.

Below are tasks for which a page-autoloader with various capabilities would be especially useful. That is, the people posting these tasks need the help of AWB'ers! If you have AWB, please lend a hand. Note that Linky may also be useful for completing the tasks on this list (see Wikipedia:OTS#Tabs_and_tabbing).

See also Wikipedia:WikiProject Check Wikipedia.

Portals

Following invertions of redirections motivated by the Manual of Style, and as recommended by Help:Moving a page#Moving a portal, please replace the links to:

Portal:Parliamentary Procedure, Portal:Molecular and Cellular Biology, Portal:Software Testing, Portal:Rock Climbing and Portal:Computer Generated Imagery

By direct links to:

Portal:Parliamentary procedure, Portal:Molecular and cellular biology, Portal:Software testing, Portal:Rock climbing and Portal:Computer-generated imagery

Thanks! Coreyemotela (talk) 19:02, 22 May 2014 (UTC).[reply]

These moves are still being discussed at Portal talk:Molecular and cellular biology#Moves, so it might be better to delay the updating of the portal names in articles. -- John of Reading (talk) 05:09, 23 May 2014 (UTC)[reply]
The inversion of redirections to respect the Manual of Style is not really discussed ; it is only questioned by one IP address. But we can still wait a little bit if you think it is better. Coreyemotela (talk) 06:29, 23 May 2014 (UTC).[reply]
Update of the request: please replace the links to:
Portal:Parliamentary Procedure, Portal:Molecular and Cellular Biology, Portal:Software Testing, Portal:Rock Climbing, Portal:Computer Generated Imagery, Portal:Musical Theatre, Portal:Body Modification, Portal:Molecular Anthropology, Portal:Indian Education, Portal:Extinction, Portal:Hazardous Materials, Portal:Human Health and Performance in Space, Portal:Hindu Mythology, Portal:Tamil Cinema, Portal:Pervasive Developmental Disorders and Portal:Aquarium Fish
By direct links to:
Portal:Parliamentary procedure, Portal:Molecular and cellular biology, Portal:Software testing, Portal:Rock climbing, Portal:Computer-generated imagery, Portal:Musical theatre, Portal:Body modification, Portal:Molecular anthropology, Portal:Education in India, Portal:Extinct and endangered species, Portal:Hazardous materials, Portal:Human health and performance in space, Portal:Hindu mythology, Portal:Tamil cinema, Portal:Pervasive developmental disorders and Portal:Aquarium fish
Thank you! Coreyemotela (talk) 06:43, 25 May 2014 (UTC).[reply]

Lots of article with typos

If any of you like to use AWB's regex typo fixing feature, I've started a scan of the most recent database dump (2014-05-02) and have identified approx. 18,000 articles with typos in the first 21% of the dump (which is a run rate of ~90,000 articles to fix in total). You're all welcome to take a page or two and help fix them. —Darkwind (talk) 07:56, 23 May 2014 (UTC)[reply]

Update: The scan is complete, and there are approx 73K articles to check at User:Darkwind/Typos. —Darkwind (talk) 02:56, 27 May 2014 (UTC)[reply]
  • I've run around 2500 of these articles, and I'm finding a name "Don D'Ammassa" that keeps getting erroneously corrected to "Don D'Amassa" (I'm thinking it's trying for "ammass" -> "amass"), and a lot of false positives on "enb..." -> "emb..." on foreign terms. I'm wondering if someone with a lot more experience with regular expressions wouldn't be willing to delve into the regexp typo fixer to see if we can't tighten up those two rules. VanIsaacWScont 22:08, 4 June 2014 (UTC)[reply]
Your comment would probably be more effective if you posted it at Wikipedia talk:AutoWikiBrowser/Typos.
For the "enb" issue: a list of the incorrect matches will be needed. There is already a list of excluded matches which will need to be expanded, but the false matches need to be known. Unfortunately, it is not possible for us to run through all non-English language dictionaries to find lists of words.
"Don D'Ammassa" should be resolved. I added an exclusion to the appropriate rule for [Dd]'[Aa]massa and tested it.
Please do report false positives, the rules need to be updated to account for them. In the time between when you find the problem and the rule is fixed, you could create a Find and Replace rule which changed the text back and mark that rule to be run "after fixes". Depending on the options you use, you might run into an AWB bug. The work around for that bug is any of: re-parse each page (F5), not use "Skip if: only minor replacement made", or don't also mark the rule as minor. However, this suggestion should not be used instead of reporting the false positive issues for typo fixing. If you are having the problem, then someone else probably is also.
Feel free to ping me if you have some specifics for the "enb" issue.— Makyen (talk) 03:19, 5 June 2014 (UTC)[reply]
Thank you for tackling that one name that for some reason keeps popping up on my feed. I'm certainly going through and hand-editing whenever I run into these odd replacements - I'm probably hitting foreign misapplications at about 1-2%, although many of them just random Spanish and French words that are a couple letters off of an English word. As for the enb-emb (probably also enm-emm and enp-emp as well), I've really run across these in a lot of different contexts - I know that Japanese, Italian, and French words have all ended up on the end of an improper en->em substitution. So I'm wondering if it might not be more efficient to work from an English dictionary list of positive matches, rather than creating exceptions. VanIsaacWScont 05:01, 5 June 2014 (UTC)[reply]
Looking at the rule in question, it does appear to be quite broad. Currently, it changes any word beginning with "[Ee]nb" that is 4 or more characters long to start with "[Ee]mb" except those listed in the exceptions. This is because English appears to have few, or no, "enb" words. [Historically there were some, but they appear to have migrated to "emb". This is based on looking in an old, out of copyright, version of Websters Unabridged dictionary, and some current sources]. So this rule is attempting to correct a couple hundred different "emb" words which may have been misspelled as "enb".
That, of course, leaves us with the issue of non-English words which start with "emb" and don't contain characters other than [a-z].
I have very little history with the typo list specifically. Prior to making a change like the one you suggest of changing to a list of words to match, I would like to get other editors involved that have more history with the typo list.
Based on what you are seeing, are there a few words which are common false positives?
Rather than leave the question as such, I did a scan for matches of this rule remaining in the 2000 articles contained in the two sets you have marked as completely done. This is on the assumption that any remaining ones would be false positives. There were 5 false-positive matches out of 2,000 pages, or 0.25%. The false positives were:
There were 3 other matches for the regular expression, but which were not corrected by the Typos for other reasons (i.e. they were not false positives).
The above have been added to the exclusion list for this rule. — Makyen (talk) 07:32, 5 June 2014 (UTC)[reply]
Using {{not a typo}} for uncommon false positives is also an option. GoingBatty (talk) 00:12, 6 June 2014 (UTC)[reply]