Jump to content

User talk:The Earwig

Page contents not supported in other languages.
This user is a member of the Bot Approvals Group.
This user has administrator privileges on the English Wikipedia.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Moshiur Rahman Khan (talk | contribs) at 20:20, 5 June 2010 (please help me: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Foul play at Afc

Hi, I've been working at Afc for a few months, and today saw a submission that was recently created and the template was still there. I preloaded the talk and informed the author. I then, by accident clicked on Page History and noticed something: the user who created the page - Richard.darren- was the same one who accepted it! Is there some kind of rule preventing this and what action should I take? Thanks, Acather96 (talk) 14:06, 15 May 2010 (UTC)[reply]

By the way, the page affected was Audio SecrecyAcather96 (talk) 14:13, 15 May 2010 (UTC)[reply]
Mmm... yes, I've seen this sort of thing before. It's a way for users to get around New Page Patrol, as you might have guessed, because page moves do not appear in Special:NewPages. I don't think there's a rule against it, but it's obviously inappropriate and generally the best thing to do is to revert the move and put it back in the Articles for creation namespace (don't forget to get the redirect deleted). This allows us to continue the review normally. I notice the submission was on hold at the time it was moved. — The Earwig (talk) 15:55, 15 May 2010 (UTC)[reply]
OK, will do :) Acather96 (talk) 06:11, 16 May 2010 (UTC)[reply]
This sounds like a good job for a bot. Josh Parris 09:26, 16 May 2010 (UTC)[reply]
You think so? It would probably require a person to review, but a bot would be good to find cases where a user might've done it. — The Earwig (talk) 02:20, 24 May 2010 (UTC)[reply]
Agree with Josh. I'll look into it. Acather96 (talk) 09:28, 16 May 2010 (UTC)[reply]

Expand template

Well, the dust is settling, and it appears that the previous decision to delete {{Expand}} has been overturned. I just checked the page and the Appeals template has been removed, but now, again, I cannot see the template on the page. I think the code you put there before that allows it to be seen on its own page even though protected is still there, but the template does not appear. Is there more code needed?

Also, I put it on my Talk page to see if I could see it there, and it didn't show up.
 —  Paine (Ellsworth's Climax17:00, 15 May 2010 (UTC)[reply]

Update. The Expand template now shows up on my Talk page because an editor inquired about it on the template's Talk page, and editor Amalthea restored the visibility. So you might want to check if your previous edit still makes the template invisible on protected pages. Other than that, everything looks copacetic.
 —  Paine (Ellsworth's Climax20:27, 15 May 2010 (UTC)[reply]

I think I've fixed it, but I don't have time right now to do a careful check. Maybe you could make sure I did it right? — The Earwig (talk) 02:19, 24 May 2010 (UTC)[reply]

Adriana Allen

I am curious why you have this page listed for deletion. Adriana is a published author and noted magazine editor. Can you please clarify the deletion and spam notes. Regards - KR Allen —Preceding unsigned comment added by Adriss24 (talkcontribs) 15:23, 23 May 2010 (UTC)[reply]

Hi there. I had listed the page for deletion because I wasn't able to find any reliable sources that discussed it. Reliable sources – basically, any reputable website, newspaper, etc. that discusses the subject in depth – are required in an article on Wikipedia, not only to prove that our information is accurate, but to make sure that we only include articles about notable subjects. I searched through Google in an attempt to find something, but was unsuccessful, leading me to believe that the company wasn't notable (worthy of inclusion). You might like to check out the notability guideline for organizations and companies and the notability guideline for people. The point of an Articles for Deletion discussion, like the one that took place at Wikipedia:Articles for deletion/Adriana Allen, is to decide, via consensus, whether an article should stay on Wikipedia or not – and the discussion concluded that it shouldn't. However, it has been a while since the article was deleted; you are welcome to recreate it if you have reliable sources supporting the subject. Thanks. — The Earwig (talk) 02:11, 24 May 2010 (UTC)[reply]
FYI: Wikipedia talk:WikiProject Spam#Second opinion requested: adrianaallen.com --A. B. (talkcontribs) 00:12, 25 May 2010 (UTC)[reply]

IRC tool

I tried using your IRC tool at [1] and it will not load, it just has a blank white screen with no log on screen. It worked yesterday. Is this a local error or is the tool disabled. Thanks --Alpha Quadrant (talk) 00:37, 25 May 2010 (UTC)[reply]

fixed --Alpha Quadrant (talk) 00:38, 25 May 2010 (UTC)[reply]
...? At any rate, I strongly recommend getting a real IRC client if you want to stay in the channel for any length of time. It's really only for AfC submitters, not reviewers. — The Earwig (talk) 22:35, 25 May 2010 (UTC)[reply]

EarwigBot

Heads up, your bot on IRC went down at 22:17 UTC 25-May-10. -- /MWOAP|Notify Me\ 22:35, 25 May 2010 (UTC)[reply]

Should be fixed. Is it okay now? — The Earwig (talk) 22:37, 25 May 2010 (UTC)[reply]
Yep. Thanks. -- /MWOAP|Notify Me\ 22:38, 25 May 2010 (UTC)[reply]

The Wikipedia Signpost: 24 May 2010

data gathering advice

Dear The Earwig,

I recently went to the Wikipedia live help IRC channel to ask for some bot advice. Chzz referred me to you. I am about to begin a research study on the histories and life cycles of Wikipedia policies. In particular, I will be looking into the discussions associated with these histories and life cycles. For this purpose, I need to collect massive amounts of data about each of the policies, guidelines, and essays in Wikipedia. Basically I need to somehow extract all of the discussions that relate to the policy/guideline/essay, so that I can put together the story of how each item came to be what it is. From what I understand, sources of data include, but are not limited to:

  1. The talk page for the essay/guideline/policy
  2. Request for comment/policies
  3. Village pump
  4. Signpost announcements
  5. User talk pages


From what I understand, there are 2 ways I can go about about collecting my data:

  1. Download the entire Wikipedia archive to a hard drive and extract information from it.
  2. Use a bot to collect data from the online Wikipedia website itself, via the Wikipedia API.


I'm not sure which of these two method I should use. What I do know is that I would like to use Python whenever possible. Which tool(s) do you recommend? Also, how would I go about learning how to use these tools?

Thank you for your time.
--Benjamin James Bush (talk) 17:34, 31 May 2010 (UTC)[reply]

Hi there. Downloading the entire database might be easier, simply because it allows you to get all of the data at once and work with it without making a large number of separate queries to the API. The full list of dumps can be found at http://dumps.wikimedia.org/backup-index.html. The latest complete dump I could find is from March 12, which isn't that old, but it's not as new as possible. That dump can be found here. You'd probably want either pages-meta-history.xml or pages-meta-current.xml, depending on whether you want old versions of pages or not (probably isn't necessary for discussions, as most talk pages have archives, but history searches can be useful for finding specific changes to page text).
There are multiple tools you can use to read XML dumps; if you're working with Python, I recommend getting yourself acquainted with the Pywikipedia framework. Designed as a suite of scripts to allow programmers to write Wikipedia-editing bots, it can also be used to retrieve data and process information inside XML dumps. This is the framework I would use for what you're doing. The thing is, it's probably not the most efficient way of doing it, but it should be able to get the job done and I'm comfortable using it. The main version of it is not very Pythonic and takes a while to figure out, and the rewrite branch is better, yet feature-incomplete. The latest version is available through svn at http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/, but you can also use the nightly build if you prefer. Once you've installed it, configuring is relatively simple: run the generate_user_files.py script, then login.py. Because you won't be editing anything, it doesn't really matter which Wikipedia account you use for configution, but "Benjamin James Bush" makes the most sense. http://meta.wikimedia.org/wiki/Pywikipedia contains a much more detailed set of instructions, but keep in mind that you won't be using a good portion of the framework, only the XML-reading part of it.
Now that we have that set up, you can begin processing the actual XML dumps. This is done with the xmlreader.py file, which you can import via your own script. You'd probably have something like:
# my dump-processing script
import wikipedia
from xmlreader import *

dump = XmlDump("pages-meta-history.xml", allrevisions=True) # load the xml dump
gen = dump.parse() # create a generator to handle all pages in the dump
...at the beginning. The generator object yields every page in the dump, with each one having different attributes. You'll probably be able to figure out more about the module by reading through the source code; I don't know if this will work 100%, because I don't normally experiment with this area of Wikipedia. Anyway, once you're done with that, it's up to you to handle the data as you want. For example, to retrieve all of the discussion text pertaining to the policy Wikipedia:Neutral point of view, one might want to retrieve the text from all talk pages of that policy, like so:
dump = XmlDump("pages-meta-history.xml", allrevisions=False) # load the xml dump
gen = dump.parse() # create a generator to handle all pages in the dump

for rev in gen: # Returns every page, I'd think? Would be XmlEntry() objects.
    if rev.title.startswith("Wikipedia talk:Neutral point of view"): # includes archives and the main page
        print rev.text
I'll briefly touch on the API, because it might be easier in the sense that you'll be able to do it without installing a framework. The api is at http://en.wikipedia.org/w/api.php; that page should provide most of the information you need to know which queries will suit your purposes. For example, this query:
...will return the text for Wikipedia talk:Neutral point of view. Using the rvlimit parameter will retrieve the text from multiple revisions, such as:
...and so on, and so forth. Processing the result can be done by using a format such as JSON (this is what I'd recommend for API queries), and Python has pretty good JSON support. An script that will retrieve the text from Wikipedia talk:Neutral point of view and print it might look like this:
# my api-processing script
import json, urllib
params = {'action':'query', 'prop':'revisions', 'rvlimit':1, 'rvprop':'content', 'format':'json'}
params['titles'] = "Wikipedia_talk:Neutral_point_of_view"
data = urllib.urlencode(params)
raw = urllib.urlopen("http://en.wikipedia.org/w/api.php", data)
res = json.loads(raw.read())
pageid = res['query']['pages'].keys()[0]
content = res['query']['pages'][pageid]['revisions'][0]['*']
print content
I don't know how familiar you are with Python or Wikipedia's structure as it is, so unfortunately I don't know how much else I can say. I hope this helps; feel free to come back and ask me more questions or if you need additional clarification, etc. — The Earwig (talk) 18:53, 31 May 2010 (UTC)[reply]
The Earwig,
Thank you for putting the bug in my ear, this will really help me get started. For the time being I think it is not feasible for me to download the Wikipedia dump (but perhaps I will buy a 4 terabyte hard drive at some point in the future). I will therefore need to make a lot of API queries. As you said earlier, the preferred python tool for API queries is JSON. My question is, should I also be using the pywikipedia framework? If so, how do JSON and Pywikipedia fit together? Could I make API queries with JSON, and then process the result with Pywikipedia? Or do you think it would be better to use JSON alone? Thanks!! --Benjamin James Bush (talk) 23:46, 31 May 2010 (UTC)[reply]
Yes, you can definitely use Pywikipedia and JSON together. However, it probably isn't necessary in your case, as you aren't going to be editing pages, and most of Pywiki's functions revolve around that. If you're going to use the API, most of Pywiki's functions will probably seem redundant; e.g., the page.get() function in Pywikipedia returns page text and you already know how to do that. I introduced the API because it's a lower-level way of accessing page text and other data than Pywikipedia, which seemed better in your case. Pywikipedia does have some interesting functions though; for example, you can use some functions to create generators for categories or for reading text files, but again, these are things you can still do with the API. Using them in conjunction can be done if you want, I do it for some of my bots, but again; you probably won't find it necessary if you have the full API at your disposal. — The Earwig (talk) 00:10, 1 June 2010 (UTC)[reply]

The Wikipedia Signpost: 31 May 2010

Moving to Afc

I remember agreeing to the wording in User:Chzz/test, but if I signed up for carrying out such notices, I missed it. Great job on cleaning up what was a mess, but I do think that if an article is moved to AfC, the original editor should get a notice. I think that is best done by whomever does the moving. (I'll copy Earwig.) I haven't checked all the example, but I did check one example, and I don't see that User:Acklis was notified.--SPhilbrickT 16:08, 3 June 2010 (UTC)[reply]

Earwig, in the interests of keeping in one place, pls -> User_talk:Chzz#Moving_to_AfC ty.  Chzz  ►  16:23, 3 June 2010 (UTC)[reply]

please help me

Hi ! I am trying to modify the conflict names. There are more KENDUA name exist in different area (District, Country). So I want to link all of them from the list in below: Kendua (Bengali: কেন্দুয়া) may refer to the link http://en.wikipedia.org/wiki/Kendua where

Bangladesh

   * [Kendua Netrokona]

India

   * [Kendua, West Bengal]

please help for it Moshiur Rahman Khan 20:20, 5 June 2010 (UTC).