Jump to content

User:Chris the speller/BioRegEx

From Wikipedia, the free encyclopedia

Suggested "Find and replace settings" for AWB when working on bio articles, especially of political and military people. Simply splice into your .xml settings file (make a backup of your settings file first). See below section How to splice. Then use "File/Open settings..." within AWB.

These have been well tested.

What the rules do

There are eight Find & Replace rules:

  1. Changes spaced hyphen to spaced en dash after US-style date, as in ≈May 1, 1888 - June 1889≈
  2. Changes ≈He served from 1923-1926.≈ → ≈He served from 1923 to 1926.≈   Also changes the en dash character and "–".
  3. Changes ≈He served from 1923-26.≈ → ≈He served from 1923 to 1926.≈   Also changes the en dash character and "–".
  4. Changes ≈to 1926 and 1928-1929.≈ → ≈to 1926 and 1928 to 1929.≈   Also changes the en dash character and "–".
  5. Changes ≈to 1926 and 1928-29.≈ → ≈to 1926 and 1928 to 1929.≈   Also changes the en dash character and "–".
  6. Changes ≈She was inactive between 1933-1941.≈ → ≈She was inactive between 1933 and 1941.≈   Also changes the en dash character and "–".
  7. Changes ≈She was inactive between 1933-41.≈ → ≈She was inactive between 1933 and 1941.≈   Also changes the en dash character and "–".
  8. Changes unspaced hyphen to en dash in 4-digit year range within parentheses: ≈(1857-1904)≈ → ≈(1857–1904)≈  Note that this does not change links (piped or not) where a year range is at the end of the page title, as in [[Cuthbert P. Hipplethwaite (1640-1717)|Skippy Hipplethwaite]]

Find & replace rules 2 through 7 are governed by WP:YEAR, which says " Ranges expressed using prepositions (from 1881 to 1886 or between 1881 and 1886) should not use en dashes (not from 1881–1886 or between 1881–1886)."

The code

[edit]
     <Replacement>
       <Find>\b(January|February|March|April|May|June|July|August|September|October|November|December)\x20(\d{1,2},\x20\d{4}\x20)-\x20</Find>
       <Replace>$1 $2– </Replace>
       <Comment>May 1, 1888 - June 1889 (hyphen to en dash)</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\b(F|f)rom\x20(\d{4})(?:-|–|&amp;ndash;)(\d{4})\b</Find>
       <Replace>$1rom $2 to $3</Replace>
       <Comment>from 1991-1997</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\b(F|f)rom\x20(\d{2})(\d{2})(?:-|–|&amp;ndash;)(\d{2})\b</Find>
       <Replace>$1rom $2$3 to $2$4</Replace>
       <Comment>from 1991-97</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\bto\x20(\d{4})(,?)\x20and\x20(\d{4})(?:-|–|&amp;ndash;)(\d{4})\b</Find>
       <Replace>to $1$2 and $3 to $4</Replace>
       <Comment>to 1997 and 2001-2004</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\bto\x20(\d{4})(,?)\x20and\x20(\d{2})(\d{2})(?:-|–|&amp;ndash;)(\d{2})\b</Find>
       <Replace>to $1$2 and $3$4 to $3$5</Replace>
       <Comment>to 1997 and 2001-04</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\b(B|b)etween\x20(\d{4})(?:-|–|&amp;ndash;)(\d{4})\b</Find>
       <Replace>$1etween $2 and $3</Replace>
       <Comment>between 1992-1998</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\b(B|b)etween\x20(\d{2})(\d{2})(?:-|–|&amp;ndash;)(\d{2})\b</Find>
       <Replace>$1etween $2$3 and $2$4</Replace>
       <Comment>between 1992-98</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>
     <Replacement>
       <Find>\((\d{4})-(\d{4})\)(?![\]|#])</Find>
       <Replace>($1–$2)</Replace>
       <Comment>en dash in year range</Comment>
       <IsRegex>true</IsRegex>
       <Enabled>true</Enabled>
       <Minor>true</Minor>
       <RegularExpressionOptions>None</RegularExpressionOptions>
     </Replacement>

How to splice

[edit]

I use the foxe XML editor from firstobject.com, which is a free download, less than a megabyte, easy to install and use. Open the .xml file that holds your saved settings from AWB, and double-click on "FindAndReplace" in the foxe tree view window (left side). Copy the above code to the clipboard, then paste it into the foxe text window (right side) below the <Replacements> tag. Optionally, you can delete replacement rules (before or after you paste the new rules); just expand "Replacements" in the tree view, then expand each replacement, click on any one and then hit the "delete" key. Save the .xml settings file. In AWB use "File/Open settings..."