Wikipedia:Typo Team/moss

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

The moss project seeks to find and remove the furry green typos that have been growing on Wikipedia articles. It uses a python script named moss and written by User:Beland to automatically find misspellings, mistakes in English grammar, violations of the Wikipedia:Manual of Style, and confusing or broken wiki markup.

Dearth to tyops!

QUICK LINK TO THE BEST PAGE FOR NEW PARTICIPANTS

About misspellings[edit]

How the lists are made[edit]

The moss spell checker is run against a recent set of database dumps, which are generated on the 1st and 20th of every month (but take a few days to process). All the articles in the English Wikipedia are examined. The following are ignored:

  • Text inside references, templates, tables, quotation marks, sections like "External links" and "Works", and some other weird places.
  • Capitalized words (which are presumed to be correctly-spelled proper nouns)
  • Words that appear in titles in the English Wiktionary (which has definitions of all words in all languages, excluding proper nouns and systematic words like chemical names and large numbers)
  • Words that appear in titles in the English Wikipedia (which explains some things that don't appear in the dictionary)
  • Words that appear in titles in the Wikispecies (which has many technical words that don't appear in the dictionary or encyclopedia)

Many mistakes are not (yet) caught:

  • Improper addition of 's (possessives are not added to Wiktionary, so these are excluded systematically)
  • Incorrect capitalization
  • Incorrect multi-word phrases
  • Wrong word used in context
  • Non-English language words not tagged with {{lang}} or where an English misspelling happens to be the same as a word in another language. (These are counted as correct spellings if they are in the English Wiktionary, which lists words in all languages – only the definitions are restricted to English.)
  • Other situations listed in #False negatives below

2020 statistics[edit]

See also: Older statistics

In the year from March 2019 to March 2020, moss volunteers fixed over 94,000 typos! The most impressive progress is in the T1 category (single-letter misspellings), where we eliminated about half from the English Wikipedia. During this period we also started fixing missing spaces (focusing on those around punctuation) and those have dropped by about one-fifth. As we make progress, clear misspellings are increasingly mixed in with unclear cases; I'll be doing some more work on separation algorithms to keep the typo reports useful, so you'll probably see some more changes to typo classifications. Thanks to everyone who has been helping out! -- Beland (talk) 16:54, 28 April 2020 (UTC)Reply[reply]

Reporting symbol Explanation Change from 2019-03-01 to 2020-02-20 Instances, 2020-04-01 dump (9f6d726) Instances, 2020-04-20 dump (5ff589d) Instances, 2020-05-01 dump (1a96ded) Instances, 2020-05-20 dump (e511f74) Instances, 2020-06-01 dump (509f79a) Instances, 2020-06-20 dump (825ceb4) Instances, 2020-07-01 dump (db9db23) Instances, 2020-07-20 dump (caa619f) Instances, 2020-08-01 dump (cf76e8c) Instances, 2020-08-20 dump (f104e58) Instances, 2020-09-01 dump (4654d88) Instances, 2020-09-20 dump (a26ccca) Instances, 2020-10-01 dump (686f5db) Instances, 2020-10-20 dump (4f90810) Instances, 2020-11-01 dump (ac54580) Instances, 2020-11-20 dump (6dbd61d) Instances, 2020-12-01 dump (917bcc8) Instances, 2020-12-20 dump (0b3409d)
TS Missing or extra whitespace or dash (or new compound) -39368 (-21%) 145297 144673 331658** 330624 328249 325399 324179 322282 321801 318621 317183 315825 314747 312110 310537 309386 308280 308977
T1 Edit distance 1 from common English word -36192 (-48%) 41090 41081 39967 39452 38783 38379 38436 38271 37803 36783 35976 34036 33539 33764 32347 33097 33559 33427
T2 Edit distance 2 from common English word -7560 (-10%) 64526 63263 60690 60321 59589 58603 58649 58521 58200 58085 57845 57329 57152 57487 57387 57511 57386 57348
T3 Edit distance 3 from common English word -5276 (-7%) 74396 73255 70516 70039 68887 68192 68149 68020 67769 67788 67482 67226 67025 67101 67002 67213 67298 67399
R Regular word (A-Z only) not near a common English word -3525 (-3%) 97726 96916 94793 93855 93252 91537 91489 91746 91521 91729 91513 91613 91339 91813 92329 93246 93377 93493
I Definitely not English (International) due to accents or mixed with punctuation (other than hyphen) -22196 (-24%) 72151 69118 65842 64827 63630 61844 61888 61782 61899 62113 61916 62003 62049 62274 62287 62390 62234 62471
W Not in English Wiktionary, in non-English Wiktionary -6764 (-8%) 75913 74351 86935 85604 83173 81894 81946 82173 81943 82170 81912 81968 81792 81256 81052 81224 81131 81192
L Probable Romanization (transLiteration) +81 (+2%) 4435 4486 4266 4199 4120 4122 4104 4113 4137 4140 4151 4164 4165 4207 4203 4234 4240 4260
ME Probable coMpound, English (with and without dash) +976 (+2%) 52269 48761 47187 47153 46830 46856 46967 47163 47052 47170 47009 47070 47066 47045 47023 47193 47142 47302
MI Probable coMpound, non-English (International) in English Wiktionary (both A-Z and non-ASCII characters, with and without dash) -18475 (-9%) 177646 176929 171484 169592 166216 164828 165140 165351 165605 166016 166208 166499 166572 167349 167961 169044 168953 169409
MW Probable coMpound, found in non-English Wiktionary -5544 (-11%) 46113 45103 43501 42931 40436 41383 41325 41440 41173 41234 40990 40956 40795 40353 40272 40454 40411 40338
ML Probable coMpound, transLiteration -124 (-3%) 3909 3874 3707 3663 3672 3575 3589 3593 3628 3639 3658 3717 3724 3779 3769 3825 3830 3822
C Chemistry words -176 (-9%) 1782 7564 7530 7644 7640 7655 7658 7659 7660 7662 7654 7644 7659 7661 7665 7659 7674 7700
N A-Z plus numbers and hyphens -1391 (-5%) 25209 23813 22650 22511 22290 22020 22052 22053 21971 22009 21960 21923 21879 21856 21885 21898 21893 21943
Z Decimal fraction missing leading Zero - 47* 0* 11405** 11418 11414 11398 11402 11421 11455 11530 11546 11578 11598 11669 11683 11703 11728 11762
P Patterns (e.g. rhyme schemes) -20 (-43%) 27 28 7 9 7 7 3 2 2 4 5 4 5 5 4 5 5 5
H HTML/XML/SGML tag -539 (-15%) 3010 2886 2938 2903 2904 2848 2693 2697 2680 2747 2757 2729 2565 2569 2542 2538 2540 2572
HB Known bad HTML tag, like <font> -1080 (-7%) 14465 14121 12903 13928 12919 14733 14022 11428 11670 11198 10191 8860 8756 8842 9725 11088 10164 10556
HL Bad HTML-like linking, like <http://...> -98 (-19%) 414 418 377 394 394 421 408 425 420 413 373 359 356 329 324 315 318 328
U URL -94 (-7%, from 2019-03-20) 1179 1152 1118 1134 1117 1122 1129 1124 1120 1124 1124 1103 1101 1099 1091 1096 1050 1055
BC Bad characters -12678 (-6%, from 2019-09-01) 192230 190482 186651 186517 185572 178698 175325 166116 159095 124158 112959 112755 112695 112633 112479 110608 110025 109808
BW Bad words -6542 (-5%, from 2019-09-20) 113682 106327 381288** 380259 378710 374982 375107 375206 375431 375306 374622 374740 374560 375010 375008 375557 374989 375663
Total -39115 (-3%, from 2019-09-20) 1207516 instances 1188601 instances 1647413** instances 1638977 instances 1619804 instances 1600496 instances 1595660 instances 1582586 instances 1574035 instances 1535639 instances 1519034 instances 1514101 instances 1511139 instances 1510211 instances 1508575 instances 1511284 instances 1508227 instances 1510830 instances
Parse failure Mismatched punctuation -5145 (-3%) 154084 articles + 40705 MOS:STRAIGHT violations 153033 articles + 40838 MOS:STRAIGHT violations 214365 articles + 37697 MOS:STRAIGHT violations 214463 articles + 37667 MOS:STRAIGHT violations 214101 articles + 37607 MOS:STRAIGHT violations 214465 articles + 37767 MOS:STRAIGHT violations 214732 articles + 37849 MOS:STRAIGHT violations 215081 articles + 37993 MOS:STRAIGHT violations 215447 articles + 38067 MOS:STRAIGHT violations 215915 articles + 38169 MOS:STRAIGHT violations 216227 articles + 38210 MOS:STRAIGHT violations 216472 articles + 38205 MOS:STRAIGHT violations 216738 articles + 38213 MOS:STRAIGHT violations 216991 articles + 38246 MOS:STRAIGHT violations 217192 articles + 38338 MOS:STRAIGHT violations 217660 articles + 38498 MOS:STRAIGHT violations 217861 articles + 38625 MOS:STRAIGHT violations 218207 articles + 38789 MOS:STRAIGHT violations
  • red = Probably need to fix
  • yellow = Unsorted
  • blue = Probably OK (but may need to verify)
  • bold = actively working on fixing

* Identification of Z was broken
** Affected by major bug fix for counting inter-word typos (e.g. involving punctuation)

2021 statistics[edit]

Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) BC BW C H HB HL I L ME MI ML MW N P R T1 T2 T3 TS U W Z D
2021-01-01 (b4af24a) 218317 + 38841 1505808 108661 375875 7705 2550 10726 311 62583 4262 47274 169504 3841 40131 21954 4 93373 32968 56903 66819 306445 1054 81112 11753
2021-01-20 (a249b2d) 218455 + 38930 1506940 108030 376079 7679 2616 11036 298 62746 4298 47044 170234 3885 39960 21959 4 93467 33598 56688 66688 306776 1042 81049 11764
2021-02-01 (8279235) 218833 + 38960 1506004 107000 375979 7677 2595 11729 298 62829 4305 47053 171005 3888 39771 21971 2 93726 33237 56822 66707 305573 1035 81079 11723
2021-02-20 (2f00c51) 218991 + 39035 1504064 106534 375909 7682 2602 11697 275 62942 4342 47036 171313 3897 39732 22009 3 93959 32705 56529 66617 304463 1020 81041 11757
2021-03-01 (248159a) 219198 + 39155 1494162 106421 376305 7669 2624 9291 281 62978 4328 46830 169666 3876 39189 21936 4 92221 32762 56197 66069 302377 1020 80338 11780
2021-03-20 (57aaae7) 219556 + 39371 1492923 106284 375853 7695 2610 9965 278 63055 4331 47064 170453 3880 39172 21998 2 92721 32523 56052 66087 299751 1002 80305 11842
2021-04-01 (d47c725) 219692 + 39478 1484879 105670 375757 7697 2620 8857 205 62842 4309 46966 170369 3884 38886 21964 0 92575 32160 55810 65706 296009 995 79736 11862
2021-04-20 (d169566) 220014 + 39634 1476477 104505 374548 7686 2648 8863 199 62668 4327 47036 170547 3878 38644 21973 4 92336 30560 55284 65191 293170 985 79487 11938
2021-05-01 (7719363) 219292 + 39601 1445819 103253 367236 7661 2387 7682 178 59749 3966 44397 165787 3774 38591 21697 4 91448 30666 56556 65257 283967 980 78634 11949
2021-05-20 (c6359fc) 219284 + 39761 1444570 102794 368258 7678 2271 7878 176 59913 3978 44514 166538 3804 38629 21725 4 91887 29205 56341 65171 282093 983 78651 12079
2021-06-01 (076f14c) 219111 + 39759 1441769 102409 368046 7689 2275 7827 166 59876 3943 44658 166622 3818 38567 21755 5 92077 28507 56157 64919 280645 975 78682 12151
2021-06-20 (ffbc72f) 219625 + 39935 1435330 101926 367522 7694 2276 7108 162 59650 3964 44692 167038 3819 38298 21687 8 92365 28020 55983 64688 276538 955 78621 12316
2021-07-01 (cb3d5e8) 219791 + 39990 1433415 101916 367581 7704 2263 6921 169 59663 3960 44770 167508 3837 38299 21674 8 92600 27369 55755 64301 275024 946 78720 12427
2021-07-20 (5c3b9e9) 220086 + 40132 1429627 101518 367954 7688 2136 6702 137 59995 3955 44805 167818 3824 38179 21646 7 92660 26469 55565 64171 272147 950 78624 12677
2021-08-01 (86e7022) 220338 + 40213 1424448 101229 367552 7708 2123 6252 121 61727 3767 44851 168279 3812 36769 21643 0 93146 26555 55547 64124 271406 953 74189 12695
2021-08-20 (33a14e3) 220370 + 40254 1414854 100973 367172 7719 2047 5736 119 59520 3746 44729 167010 3811 37772 21537 2 92763 24146 54950 63571 266761 960 77075 12735
2021-09-01 (90e0a3b) 220449 + 40268 1411194 100113 367110 7714 2046 5801 120 59567 3733 44623 167222 3824 37710 21525 2 92833 23310 54796 63455 265044 953 76926 12767
2021-09-20 (c71a444) 220781 + 40328 1412140 99635 367286 7713 2040 5650 121 59595 3766 44828 167997 3843 37719 21561 0 93701 22924 54661 63575 264775 948 76966 12836
2021-10-01 (cdd699c) 221094 + 40362 1405448 99065 367498 7683 2060 5774 111 59546 3710 44579 167357 3831 37696 21381 2 93027 22576 54268 63134 261463 952 76883 12851 1

A major upgrade to word categorization was made in October 2021. The same dump is shown on the old and new systems for comparison. R, I, W, MI, MW, and ML were eliminated and sorted by language as TE or TF instead. New categories:

  • A = mAth
  • T/ = Suspected MOS:SLASH violation
  • TE = AI thinks it's trying to be English
  • TF = AI thinks it's trying to be a non-English language (Foreign to English Wikipedia), sorted by language (e.g. TF+el)
Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) A BC BW C H HB HL L ME N P T/ T1 TE TF TS U Z
2021-10-01 (2ec07e4) 221094 + 40362 1457644 17030 175488 367537 4049 2060 5774 111 5428 237959 2329 37 3237 54108 10076 439099 118822 1649 12851
2021-10-20 (b44e087) 221396 + 40415 1452333 22433 173701 381776 7762 2032 5341 95 5399 219482 2351 6 3252 53679 10151 438103 112265 1613 12892
2021-11-01 (0786728) 221592 + 40396 1476996 22385 97423 481799 7793 1573 5122 97 5399 219638 2297 9 3246 53546 10145 440061 111957 1607 12899
2021-11-20 (34069e9) 153165 + 42992 1491000 23808 99945 497995 7816 1609 5587 111 5688 222435 2340 9 3373 53516 9847 426498 116119 1642 12662
2021-12-01 (0fc2fb3) 153177 + 42994 1489025 23727 99782 496905 7828 1558 5602 104 5702 222571 2346 8 3359 53405 9816 425937 116070 1627 12678
2021-12-20 (d20f520) 153289 + 42902 1488550 23761 99074 496904 7845 1561 5601 108 5715 223063 2351 4 3337 53580 9806 425623 115890 1618 12709

2022 statistics[edit]

Dump (moss version) Parse failures (articles + articles with MOS:STRAIGHT violations) TOTAL (instances) A BC BW C D H HB HL L ME N P T/ T1 TE TF TS U Z
2022-01-01 (92506e2) 153265 + 42919 1488043 23730 98949 496872 7872 0 1561 5712 108 5744 222842 2355 8 3337 53020 9801 425923 115845 1608 12756
2022-01-20 (f63dc78) 153371 + 42894 1490532 23729 98433 497315 7875 1 1603 6158 108 5794 223402 2345 5 3325 53057 9667 426560 116722 1594 12839
2022-02-01 (8fbf720) 153444 + 43002 1621627 23804 98366 497551 7934 1 1579 6051 108 6007 240216 2381 13 3334 58724 11652 531477 117630 1599 13200
2022-02-20 (8245233) 153724 + 43135 1622459 23835 98083 497766 7956 1 1604 5177 102 5999 240497 2370 14 3281 59384 11661 531576 118343 1616 13194
2022-03-01 (8245233) 153733 + 43208 1624427 23837 98107 497855 7989 1 1571 5815 102 6027 240789 2371 16 3278 59744 11669 531890 118567 1608 13191
2022-03-20 (fb66b79) 153882 + 43327 1624509 23823 97961 498466 7996 1 1552 4746 106 6059 241192 2363 15 3311 60058 11638 531382 119054 1601 13185
2022-04-01 (fb66b79) 153932 + 43430 1626452 23823 97828 498085 8000 1 1594 4793 105 6063 241718 2375 16 3327 60572 11642 532088 119684 1591 13147
2022-04-20 (fb66b79) 154017 + 43596 1630486 23789 97841 498611 8012 1 1607 4990 105 6065 242940 2374 17 3337 60977 11649 532927 120483 1587 13174
2022-05-01 (fb66b79) 153825 + 43698 1631287 23793 97801 498632 8020 1 1609 5048 104 6073 243306 2384 20 3337 61453 11694 533878 119359 1579 13196
2022-05-20 (cc63e5f) 153870 + 43814 1635174 23851 97718 498090 8043 1 1636 4925 107 6103 243986 2385 19 3337 59550 11866 538310 120406 1574 13267
2022-05-20 (ae346b0)* 164831 + 29862 1620797 23846 92522 487792 8099 1 1631 4930 110 6076 244851 2308 18 3335 60170 11838 538751 119670 1580 13269
2022-06-01 (6090418) 164899 + 29887 1620209 23786 92402 487512 8099 1 1620 4620 113 6090 245017 2309 16 3331 60318 11803 538115 120085 1587 13385
2022-06-20 (97d23b9) 164770 + 29816 1617952 23775 91799 486712 8102 0 1611 4705 116 6087 245190 2319 13 3300 59666 11763 538585 119215 1568 13426
2022-06-20 (1432a2f)** 164877 + 29821 1677855 23781 91816 547534 8102 0 1611 4706 116 6071 245153 2318 13 3297 59659 11764 537643 119292 1554 13425
2022-07-01 (9ab6dad) 164769 + 29855 1674273 23732 91585 547881 8113 0 1644 4657 116 6110 244376 2295 143 3261 59286 11657 535628 118761 1559 13469
2022-07-20 (06d752b) 164636 + 29850 1674512 23605 91172 547558 8111 0 1663 4856 126 6127 244725 2294 144 3272 58857 11659 536841 118429 1550 13523

* ae346b0 dump is the first one where content inside curly quotes is ignored
** 1432a2f added more excluded end sections

Instructions for editors[edit]

Just like a regular spell checker, sometimes a word that's highlighted is really a misspelling and should be changed, but sometimes it is a correct spelling that needs to be added to the spell checker's dictionary (which in this case is the English Wiktionary and Wikispecies). For the below lists, here's how you can help:

  • For spelling mistakes: Click on the links to the individual Wikipedia articles, and edit them to correct the misspelling. Make sure this is actually a misspelling, and not a technical term that needs to be better explained, or an alternate spelling (possibly from a different regional variety of English).
  • For non-English words (including words from Old English and Middle English, since they are pronounced differently): Edit the article and use the {{lang}} or {{transl}} templates to mark all non-English passages. Template contents are ignored, so they will not show up in the next report. If you can define the word, it would still be helpful to add the non-English word to the English Wiktionary or the same-language Wiktionary if you speak that language. As of the March 20, 2019 dump, only words not found in any Wiktionary are reported by moss as misspellings. (The "home" Wiktionary for Old and Middle English words is the modern English one.)
    • If you don't know which language is being used, you can tag it with {{which lang}}. If you add a "reason=" parameter, that will change the pop-up tooltip text readers will see when they hover over "what language is this?". If you have a guess as to which language it might be, or any other question or comment, you can leave that here to help future editors. If you use this tag, you can delete the article from the moss listing; the article will be added to Category:Articles with unidentified words instead, and ignored by future runs of moss until the mystery is solved.
    • For languages that don't have a code (often happens with historical languages), use "mis" and add an HTML comment indicating the language. For example: {{lang|mis|sharbe do kin ratz}}<!-- Old Runish -->
  • For incorrect spellings in direct quotes:
    • These shouldn't be picked up by the spell checker, as text in double quotes "" is ignored. The article probably has incorrect punctuation.
    • Regardless of punctuation problems, you can add {{sic}} around the word or phrase. See Wikipedia:Manual of Style#Quotations for guidance.
  • For correct spellings that belong in the dictionary: Click on the word to add it to the English Wiktionary. Remember the word might not be English (though the definition must be) and be sure to check capitalization!
  • For correct spellings already in the dictionary: Delete from the list. These have been added in the meantime since the database dump by other editors. They do not automatically turn red as internal Wikipedia links do.
  • For correct spellings not appropriate for Wiktionary:
{{chem name|poly(1-phenylethene)}}
This should not be used for chemical formulas such as H
2
O
, for which {{H2O}} or {{chem}} and {{chem2}} may be appropriate. For some common compounds there are specific templates available such as Template:CO2.
  • Correct or incorrect, when finished delete the entry for the word from the lists on this page (or subpages), so work won't be duplicated. (There is no longer any need for strikethru.)
  • If an article or section has generally bad grammar, and you don't have time to fix the whole thing, just add {{copyedit}} at the top of the article or {{copyedit|section}} at the top of the affected section. If it's just a sentence or two, {{copy edit inline}} or {{incomprehensible inline}} can go at the end of the problem passage.
  • If you see errors being reported from footnotes or bibliographies, check to make sure the section is titled with a standard name following MOS:APPENDIX conventions. Standard end-matter sections like "References" and "Further reading" and "Works" are ignored.
  • If it helps to leave a message on the article's talk page asking if the word is correct or incorrect, you can use Template:Typo help like this when editing the bottom of the talk page (leave the section header blank; it will automatically be added):
{{subst:typo help|PUT WORD HERE}} -- ~~~~
  • If you are uncertain whether a word is spelt correctly or not, you can add {{typo help inline}} immediately after it. If you add a "reason=" parameter, that will change the pop-up tooltip text readers will see when they hover over "check spelling". You can add a specific question or comment that may help identification. If you use this tag, you can delete the article from the moss listing; the article will be added to Category:Articles with unidentified words instead, and ignored by future runs of moss until the mystery is solved.

Don't worry if you miss something; it will reappear in a future report if there are still mistakes.

Suggested edit summaries[edit]

If you want to help publicize this project, you can copy-and-paste these into your edit summary, if appropriate.

For Wikipedia edits:

Fix misspelling found by [[Wikipedia:Typo Team/moss]] – you can help!
Tag non-English text found by [[Wikipedia:Typo Team/moss]] – you can help!
Tag correct text as {{not a typo}} for automated spell checkers (including [[Wikipedia:Typo Team/moss]])
Fix mismatched quote marks found by [[Wikipedia:Typo Team/moss]] – you can help!

For Wiktionary edits:

Add word identified by [[w:Wikipedia:Typo Team/moss]] – you can help!

Wiktionary cheat sheet[edit]

Need to add a word to Wiktionary? The Wiktionary cheat sheet has copy-and-paste templates that make it easy for the types of words commonly encountered here, even if you've never done it before.

Misspellings - lists of things to fix[edit]

Likely misspellings by article (main listing)[edit]

The most efficient list to work on if all you want to do is fix misspellings. These listings try to list all the typos from a given article, so they can be fixed all at once. It also tries to only show typos that legitimately need fixing. It's not perfect, so a few words found need to be added to Wiktionary or tagged as not English, not a typo, etc. Only a few letters are updated on each run, to avoid stale listings as the whole list takes far longer than two weeks to work through. (This also avoids duplicating recent work when listings are refreshed.)

See subpages due to length:

Notes:

  • For more cases that require investigation, see Category:Articles with unidentified words.
  • Due to length and an increased number of false positives, typo reports for dumps 2020-05-20 and later don't include T2+, T3+, and TS+BRACKET+.

Possible typos by length[edit]

(Updated from 2022-05-20 dump.)

Longest or shortest in certain categories are shown, sometimes just for fun and sometimes because they form a useful group. Feel free to delete articles that are fixed or tagged.

Likely chemistry words[edit]

These need to be checked by a chemist and marked as {{not a typo}}.

Chemical formulas[edit]

(Updated from 2022-06-20 dump.)

Chemical formulas should be written with HTML subscripts or {{chem2}}; these listings identify those that incorrectly just use regular numbers.

Chemical formulas that use Unicode subscripts (which is against MOS:SUBSCRIPT) will be detected automatically by moss_entity_check.py.

Chemical formulas that use <sub>...</sub> are allowed by MOS:CHEM, but may show up in the main typo listings above. They can be converted to use {{chem2}} to be accepted by the spell checker, and {{chem2}} is also the way to fix listings of partial formulas.

Any "possible" listings that aren't chemical formulas can be cleared from this list by adding a redirect to an appropriate target (like Dy4 Systems). Most "known" listings that aren't chemical formulas can be fixed with {{proper name}}.

Redirects added for strings that are chemical formulas should be added to Category:Chemical formulas.

Most chemical articles[edit]

Articles with a large number of chemical formulas triggering the spell checker are listed here (manual check on 2022-06-20 dump; counts include potential typos other than formulas, mostly compound names):

Possible chemical formulas that don't use subscripts[edit]

Note: These are easier to find by searching with "insource://", for example: insource:/Si6Al2/. -- Beland (talk) 08:04, 8 August 2022 (UTC)Reply[reply]

  • 15/6 - Si6Al2 → From Ca2[(Mg,Fe)3Al2]Si6Al2O22(OH)2 and its many compositional variations; see Double chain inosilicates
  • 11/7 - Si6O18 → Compound of SiO3; see Silicate and Cyclosilicates. Related to Beryl.
  • 9/8 - Si3O9 → As above. Related to Benitoite.
  • 9/8 - Cu6 → Copper compound
  • 9/8 - Al2Si2O5 → From Al2Si2O5(OH)4; see Kaolinite
  • 9/4 - Fe7C3 → Form of Iron carbide[1][2]
  • 9/1 - Si25O73 → Complex mineral forming chemical compound; see Eudialyte group
  • 8/5 - Mg3Al2 → From silicate mineral Mg3Al2(SiO4)3[3] and its compositional variations; see Pyrope
  • 8/1 - V3R5 → version 3 release 5
  • 8/1 - Fe3Sn2 → Kagome metal
  • 7/7 - Cu5 → Copper compound
  • 7/6 - Si4O11 → See Inosilicates
  • 7/6 - In20 → Unsure — no results from Google or PubChem. Single Wikipedia result from Tin-Indium-Lead alloy 532[4] Sn54Pb26In20; see Solder alloys
  • 7/5 - Mn5 → Manganese compound
  • 7/5 - K3V2 → From K3V2(PO4)3; see Potassium-ion battery § Cathodes
  • 7/1 - Si8O22F2 → Fluorosilicate compound found in minerals
  • 7/1 - Ga2I62 → Related to Gallium halides; see Intermediate halides
  • 7/1 - B3R2
  • 6/5 - S50B32
  • 6/5 - Ge9
  • 6/3 - La2S3
  • 6/3 - H3R17
  • 6/2 - Ga2I3
  • 6/2 - C6R6
  • 6/2 - C3N2H3
  • 6/1 - As8S9
  • 5/5 - Si9O27
  • 5/5 - H3K18
  • 5/5 - Fe5Si3
  • 5/5 - Bi4Ti3O12
  • 5/4 - Zr4
  • 5/4 - V2O7
  • 5/4 - Se4N4
  • 5/4 - Mo6S8
  • 5/4 - Li4Ti5O12
  • 5/4 - K2C8H8
  • 5/4 - Fe4S3
  • 5/4 - C3N3
  • 5/4 - Bi2O2
  • 5/4 - Al63Cu24Fe13
  • 5/4 - Ac2S3
  • 5/3 - V3R6
  • 5/3 - Na2S4
  • 5/3 - Na12
  • 5/3 - H5O2
  • 5/3 - H3R26
  • 5/3 - Cf2O3
  • 5/2 - Sm2Co17
  • 5/2 - N62B44
  • 5/2 - Mn5Si3
  • 5/1 - Si4O13
  • 5/1 - B12Cl11
  • 4/4 - Ti22
  • 4/4 - Si4O10
  • 4/4 - Sb3O6
  • 4/4 - Pb9
  • 4/4 - No17
  • 4/4 - No16
  • 4/4 - Mg3Si4O10
  • 4/4 - Kr2
  • 4/4 - H4R3
  • 4/4 - Gd3Ga5O12
  • 4/4 - Ca3Al2O6
  • 4/4 - C6H5O7
  • 4/4 - C6H3Cl2
  • 4/4 - C2B2
  • 4/4 - B4O5
  • 4/4 - B18B4
  • 4/4 - Al2Si2
  • 4/3 - W18O49
  • 4/3 - Si12O30
  • 4/3 - S6K2
  • 4/3 - Ni6
  • 4/3 - H3R8
  • 4/3 - Ca3Al2
  • 4/3 - B3O3
  • 4/3 - B18C4
  • 4/2 - R2P2
  • 4/2 - Pb2H4
  • 4/2 - P3N3
  • 4/2 - P3K2
  • 4/2 - Ni31Si12
  • 4/2 - H4K8
  • 4/2 - Cu4O3
  • 4/2 - Cu2Cr2O5
  • 4/2 - Cr7C3
  • 4/2 - B5O6
  • 4/2 - As4S5
  • 4/1 - Ti4N3
  • 4/1 - Ta5N6
  • 4/1 - Ta2Cl6
  • 4/1 - Mn12O12
  • 4/1 - Mg3Si2O5
  • 4/1 - Ho5
  • 4/1 - H4H2

Known chemical formulas that don't use subscripts[edit]

CO2 A-M[edit]
CO2 N-Z[edit]
H2O[edit]
CS2[edit]

(Mostly not carbon disulfide.)

C2H2 zinc finger weirdness[edit]

These might be better written as Cys2His2; see Zinc finger#Classes. -- Beland (talk) 01:16, 18 June 2022 (UTC)Reply[reply]

CH4[edit]
Everything else[edit]

Repeating patterns[edit]

For rhyme schemes, they probably need to be re-styled to follow Wikipedia:WikiProject Poetry#Style for rhyme schemes. If this ends up making them all-caps, they won't show up here on the next run. For mixed-case rhyme scheme notations, use {{not a typo}} after making sure dashes, commas, and spaces follow the recommended style.

(Updated from 2022-05-20 dump!)

False positives[edit]

Is there a word that is correctly used in an article, but which shouldn't be added to Wiktionary? List it here, and Beland will fix the problem.

Archived solutions: Wikipedia:Typo Team/moss/Archive

False negatives[edit]

Is there a misspelled word in an article mentioned here that was not reported? Feel free to list it below and Beland will try to improve the code if appropriate.

These are currently over-ignored, but could be used to suggest correct spellings:

  • Wikipedia articles with {{R from misspelling}}, {{R from incorrect name}}, {{R from miscapitalisation}}, and redirects to these templates
  • Wiktionary entries that are known misspellings (e.g. wikt:anticiliary)
  • In cases where there are variant spellings of the same word or phrase, Wikipedia should probably pick one and stick to it except to mention the variants. This happens with:
    • Compound words - whether to use a space, dash, or nothing, as in "junebug" vs. "june bug" or "email" vs. "e-mail".
    • Words with multiple transliterations from another language (often there are multiple systems, no particular system, or a modern system different from historical systems).
    • Redirects with {{R from alternate spelling}} and redirects to that template.
  • Article Ana Recio Harvey | detected misspelling: appoinment | additional, undetected misspelling: enterpreneur
    • Looks like this was because of redirects with "enterpreneur" in the title. I have tagged them all {{R from misspelling}}, but I'll have to change the code to ignore those, as noted above. Thanks for catching that! -- Beland (talk) 23:52, 18 October 2018 (UTC)Reply[reply]

Archived notes[edit]

See Wikipedia:Typo Team/moss/Archive.

For Wiktionary[edit]

Spell-checking Wiktionary itself[edit]

A new project has started to do that using moss software, at wikt:Wiktionary:Spell check.

Triaged for Wiktionary[edit]

Dictionary writers needed! And speakers of languages other than English!

Many words (English and otherwise) detected as potential typos have been manually triaged as legitimate words that need to be added to Wiktionary, and are listed at Wikipedia:Typo Team/moss/For Wiktionary. (Moved from this page due to length.) Many of the subpages under the misspelling main listing also have long lists of words to add to Wiktionary, which are sometimes bundled up and moved to the "For Wiktionary" subpage.

Wiktionary aims to have definitions for all words in all languages (with some exceptions), and acts as the primary database for the moss spell-checker.

Highest-frequency words missing from dictionary (n-z)[edit]

Good candidates for words to add to the English Wiktionary (which provides English definitions for words in all languages, including all compound words), as it seems English Wikipedia readers will frequently encounter them. For each run, only words from half of the alphabet are shown, to avoid duplicate work from when new dumps are being processed.

Most of the words are not from English. To get them off this list, you can either add an entry to the English Wiktionary (which provides English definitions for words in all languages) or tag all instances of the word on the English Wikipedia with {{lang}}. Wiktionary does not accept Romanizations for some languages, so those cases must be tagged as {{transl}} or {{lang}}.

Legitimate misspellings are candidates for Wikipedia:Lists of common misspellings. If there is an obvious correction, adding that to Wikipedia:Lists of common misspellings/For machines will help editors who use automated tools to fix cases faster.

Translation and general cleanup[edit]

See Wikipedia:Typo Team/moss/not English.

Mismatched markup and punctuation[edit]

Errors in punctuation (mostly quotation marks) and wiki markup generally cause confusion for readers, and also prevent the spell checker from running on these articles.

Inches and feet should not use " and ', per Wikipedia:Manual of Style/Dates and numbers#Specific units; use letters instead. (See MOS:UNITS for general guidance.) Where conversions are needed, use {{convert}}, for example: 2 feet 3 inches (69 cm)


WORK IN PROGRESS

  • Integrating these with main listings
  • Filter only unmatched " for now
    • Filter articles with non-ASCII quote marks to a separate list for JWB processing
    • Filter \d" and \d' to a separate sublist for inch/feet style conversion
  • Explain ✂ or skip snippets showing this
  • Bracketbot web UI seems to be down

-- Beland (talk) 19:03, 4 September 2019 (UTC)Reply[reply]

Gender-neutral language[edit]

Manned[edit]

The word "manned" and related forms like "unmanned" are used in many articles, but is not gender-neutral as required by MOS:S/HE and the NASA style guide. Gender-neutral alternatives include:

  • Crewed, uncrewed
  • Staffed, unstaffed
  • Human spaceflight
  • Defended

Not all instances need to be changed.

  • Proper nouns should remain the same, like Manned Orbiting Laboratory
  • Titles of sources and quotes should remain unchanged.
  • If the term itself is being discussed, for example to say that "manned spaceflight" is another way of saying human spaceflight.
  • There seems to be consensus on unmanned aerial vehicle that this and related phrases (like unmanned aerial system) should remain intact, since it is much more frequent than "uncrewed aerial vehicle" at the moment. However, when using Wikipedia's voice it is preferred to describe a UAV as "uncrewed" when not using the whole phrase.
  • Non-article pages that are retained for historical interest shouldn't be modified if they won't be visible to readers.
  • Redirects with this title should be left alone if they are redirecting readers to a gender-neutral title

If the word is found the names of articles and categories (except those with names directly related to UAVs), those should be renamed, and the links changed. Many articles have already been renamed, and the links just need to be updated. (Remember that to rename a category, all the articles in that category must be edited to change their pointers.)

Borderline cases[edit]

These may need to be discussed before being changed.

  • Manned Venus flyby - Based on the NASA style guide, NASA probably would now refer to this as "human Venus flyby" but historical sources say "manned Venus flyby" so that's what the majority of editors commenting on the talk page currently favor. There is some question as to whether the scope of the article concerns a specific mission or this type of mission in general, which is related to the proper name exception (but then the title would be "Manned Venus Flyby"). Compare Colonization of Venus and Human mission to Mars. -- Beland (talk) 19:41, 21 May 2019 (UTC)Reply[reply]
Discussion in progress on Talk:Manned Venus flyby. -- Beland (talk) 09:37, 5 January 2022 (UTC)Reply[reply]

Objections in specific cases:

Marriage[edit]

Wikipedia:Writing about women § Marriage points out:

Ladies[edit]

Wikipedia:Writing about women § Girls, ladies prefers "women" to "ladies" except where part of set phrases or traditional titles (like first lady). find all lowercase "ladies"

Instructional and presumptuous language[edit]

MOS:NOTE says to avoid the following phrases when they address the reader directly. Not all instances are problematic, such as those in direct quotations.

Internationally comprehensible spelling and vocabulary[edit]

MOS:COMMONALITY advises the use of vocabulary and spellings that are shared across national varieties of English, where possible. This section collects instances where an unshared term is being used which could be improved. For proper nouns and direct quotes, a translation or re-spelling into another dialect may be helpful.

looks like its wrapped up, with jail preferred except in proper nouns Xurizuri (talk) 15:36, 21 December 2020 (UTC)Reply[reply]

Currency style[edit]

Per MOS:CURRENCY:

  • For the UK, Irish, Australian, New Zealand, and South African pound, ₤ should be changed to £
  • ₤ is OK to use with Italian lira. Changing e.g. ₤100,000 to [[Italian lira|₤]]100,000 will prevent legitimate uses from showing up in automated reports, and also help readers understand that this is not British pounds. (Mentions of Italian lira are increasingly rare because it has been replaced by the Euro.)

Find all problem cases for ₤

Caution: Not all problem pages show up reliably; if you do a search, fix all the pages in the results, and then do another search, you will probably get a fresh batch of problem pages. It may also take a minute or two for fixed pages to disappear from the results, due to lag updating the search index.

Work is in progress on detecting and fixing other MOS-related issues with numbers and currencies.

Small caps[edit]

Per MOS:BCE, smallcaps are not to be used for years like "400 BC". Find all instances of known smallcaps issues...

HTML tags[edit]

Updated from 2022-05-20 dump.

You can do one of two things for these articles:

  • Remove, repair, or convert the HTML markup to wiki markup yourself.
  • Tag the article {{cleanup HTML}} and it will show up under Category:Articles with HTML markup but not on this list. Use the "tags" parameter to indicate which tags are present on the page; many editors find it hard to locate the offending HTML. For example: {{cleanup HTML|tags=table, cite}}

How to clean up[edit]

See Category:Articles with HTML markup for instructions on how to find the offending tags and what to do about them.

Find all articles by tag[edit]

Can't wait for the next database dump? Want to look for or fix all instances of a specific tag? Use the links below!

Additional HTML problems are listed at Special:LintErrors.


Sometimes editors use angle brackets (< and >) for other purposes. Though these are not HTML markup, they often need to be fixed.

<<...>> find all can indicate:

  • French quotation marks rendered as <<quoted text>>. These should be normalized to "quoted text" or 'quoted text', even in quotations, per MOS:CONFORM.
  • A broken citation that should be converted to {{cite web}})

Other weirdness:

  • <the> - find all - More French quoting style, bad linking, bad citation style, etc.
  • <blockquote> sometimes shows up on the reports if it is capitalized or all-caps on the article page. It should be all lowercase.

Known bad HTML tags (HB)[edit]

These are also included in the main listings.

Bad link formatting (HL)[edit]

These are also included in the main listings. Angle brackets are not used for external links (per Wikipedia:Manual of Style/Computing § Exposed URLs); "tags" like <https> and <www> are actually just bad link formatting. See Wikipedia:External links#How to link for external link syntax; use {{cite web}} for footnotes.

Unsorted (H)[edit]

Many of these can be replaced by {{var}} (for text to be replaced) or {{angbr}} (e.g. for linguistic notation). Enclose in <code>...</code> for inline software source code.

Need debugging[edit]

Notification of new dumps[edit]

"Most likely misspellings by articles" should always have work to do (if not, ping Beland to add more from the current dump). Some of the other sections are occasionally waiting for a new dump to get a useful list, either because they are ranked by frequency or a code change has been made to clean up noise in the next run. New runs are generally posted twice a month. The database snapshot from the first day of the month generally takes about 9-13 days to process, and the snapshot from the twentieth day of the month might take 4-6 days until it can be posted.

All that said, if you want to get a ping when results from a new dump are posted, you can add your name to the list below. If you are only interested in a particular section, include a note to that effect.

moss code and data sources[edit]

moss is written in Python, and is available on github at: https://github.com/cdbeland/moss

Data is obtained from XML database backup dumps.

  1. ^ "Hidden carbon in Earth's inner core revealed by shear softening in dense Fe7C3". Proceedings of the National Academy of Sciences of the United States of America. PMID 25453077.
  2. ^ "Graphene wrapped Fe7C3 nanoparticles supported on N-doped graphene nanosheets for efficient and highly methanol-tolerant oxygen reduction reaction". Journal of colloid and interface science. PMID 31465966.
  3. ^ "Phosphorus recovery from human urine and anaerobically treated wastewater through pH adjustment and chemical precipitation". Environmental technology. PMID 21879544.
  4. ^ "Indium Corp. Indalloy 532 Tin Solder Alloy".