User:Silas S. Brown/Unicode redirects

From Wikipedia, the free encyclopedia

Does Wikipedia want all characters in Unicode to be redirected to the nearest appropriate article?

I noticed a lot of Unicode redirects existed already, but not all. So I tried adding some more. I started by editing the Emoji article and wikifying all the characters (using a regexp search/replace tool in my editor), then manually created redirects for the red links as appropriate (often redirecting lots of similar symbols to the same page, e.g. all the clock faces to Clock face). Note that a lot of the links already existed and were already redirecting to articles before I started this; I just finished off the job.

After that, I looked at Private Use (Unicode). Some of the private use codepoints have been used for Emoji by some software (e.g. Softbank) and it might have been useful to document this, but the exact same codepoints have of course been used for completely different purposes by other software (that's the point of private use), so I thought perhaps it's best if all private use characters redirect to Private Use (Unicode), even if they have previously been redirected elsewhere e.g. Tengwar (some private-use characters had redirected to Tengwar but were also used by Softbank Emoji encoding and other uses; I know Wenlin uses rather a lot of private-use characters for extra Chinese as well).

After that was completed, I modified my script to the version below which redirects any pages which aren't already redirected in the blocks listed at the top of the script:

import simplemediawiki
wiki = simplemediawiki.MediaWiki('http://en.wikipedia.org/w/api.php',user_agent='Me on the Python command line with simplemediawiki')
assert wiki.login("(my username here)","(my password here)")
import time
for theRange,theBlock in [
    (range(0x2800,0x2900),"Braille Patterns (Unicode)"),
    (range(0x370,0x400),"Greek alphabet"),
    (range(0x400,0x530),"Cyrillic script in Unicode")
    (range(0x530,0x590),"Armenian alphabet"),
    (range(0x590,0x600),"Unicode and HTML for the Hebrew alphabet"),
    (range(0x600,0x700),"Arabic script in Unicode"),
    (range(0x700,0x750),"Syriac alphabet"),
    (range(0x1f00,0x2000),"Greek alphabet"),
    (range(0x2c80,0x2d00),"Coptic alphabet"),
    (range(0x2f00,0x2f00+214),"List of Kangxi radicals"),
    (range(0x3130,0x3190),"Hangul"),
    (range(0x1100,0x1200),"Hangul"),
    (range(0xa960,0xa980),"Hangul"),
    (range(0xac00,0xc687),"Hangul"), # actually ac00 to d800, but script was stopped after c686
    
     ]:
  for c in theRange:
   time.sleep(1) # I know now this should have been 10 at least
   pages = wiki.call({'action': 'query', 'prop': 'links', 'titles': unichr(c).encode('utf-8')})['query']['pages']
   page = pages[pages.keys()[0]]
   if 'links' in page and page['links']:
     print "%x already has redirects/links: %s" % (c,repr(page['links']))
     continue
   pages = wiki.call({'action': 'query', 'prop': 'info', 'titles': unichr(c).encode('utf-8'), 'intoken':'edit'})['query']['pages']
   page = pages[pages.keys()[0]]
   token = page['edittoken']
   print "Redirecting %x to %s" % (c,theBlock)
   print wiki.call({'action': 'edit', 'title': unichr(c).encode('utf-8'),"text":"#REDIRECT [["+theBlock+"]]",'watchlist':'unwatch','recreate':'true','token':token})

However, not everyone agrees these redirects are a good idea, as discussed on Wikipedia:Administrators' noticeboard/Archive241#User talk:Silas S. Brown and his thousands of redirects to Hangul and User talk:Silas S. Brown#Redirects to Hangul. So I wonder what we should do now?

Options:

1. If the community does want the redirects, I can finish the script, make it behave better and ask for bot approval so it can complete the job properly.

2. If the community does not want the redirects, they can be deleted, but how? Does somebody (me??) need to write another script to do that? Such a script would need admin privileges to delete articles.

3. The community might want the redirects for some Unicode blocks but not others. For example, people might want to keep the Private Use (Unicode) ones but not the Hangul ones. Can this be discussed?

Everyone please feel free to add comments below and/or link to this page from anywhere else that's appropriate. Thanks. Silas S. Brown (talk) 15:34, 19 October 2012 (UTC)