User talk:NicDumZ

From Wikipedia, the free encyclopedia
  (Redirected from User talk:DumZiBoT)
Jump to: navigation, search


Contents

Help needed for changing the title of the topic "Chola Dynasty" to 'Chola Empire"[edit]

Hello Sir, greetings from N.Srinivasan user srirangam99.

To come straight to the point.. I have this book by Prof. K.A.Nilakanta Sastri which is called 'A History of South India : from Pre-historic Times to the Fall of Vijayanagar', it is printed by the Oxford University Press and its ISBN No. 'seems to be' 019 560686-8 (followed by another number below the ISBN lines which is 9 780195606867. The book is in paper back. and the original print is probably 1875 with the current edition being the Twenty first (with Introduction) 2003.

In this book on page on page xxiii - the following text appears which I am repeating to you verbatim:

"""To a large extent, however, the Chola state far more out weighted all other South Indian kingdoms both in territorial and maritime control and stable political structures, and lasted much longer as a regional power (four hundred years) than any of them, except that of Vijayanagar and its description as an empire is more justifiable and valid from several points of view."""

On the basis of the above text, I want to request concerned Administrators that they may like to change the title of this article from CHOLA DYNASTY to "CHOLA EMPIRE". Let me know your response.

Sir can you help me with this please?

Srirangam99 (talk) 13:25, 1 June 2009 (UTC)

Collateral efects of es:usuario:DumZiBoT[edit]

Please,stop the bot. For example: here the title is incorrect, your bot put Sitio no disponible en este momento. Intente más tarde ja ja. Other bad titles: es:Mozilla Firefox‎ ver aquí title very long:

Totalidea Software: Tweak Windows Vista - Windows Vista Tweaks - Vista Tweaks - TweakVI - Tweak-VI - Tweak-Vista - TweakVista - TweakXP - Tweak-XP - Tweak XP - Registry - Regedit - Windows Tuning - Windows XP - Windows Vista - Tweaking - Optimize - T...

In es:Machu Pichu, ver aquí, a http://www.waterhistory.org/histories/machupicchu/ ,put title WaterHistory.org, in this case not is convenient, is preferible www.waterhistory.org/histories/machupicchu/.

I think that can exist other cases , saludos es:usuario:Shooke , Shooke (talk) 18:21, 31 May 2008 (UTC)

I have to support the second point. It should also be able to identity that Nacionalista Party .com is same as nacionalistaparty.com (the domain). Suggested code is
if re.sub(r'[^A-Za-z\.\-]', r'', ref.title.lower()) in domain.match(link+redir).group():
    # Should improve url and redirect matching
    repl = ref.refLink()
    new_text = new_text.replace(match.group(), repl)
    wikipedia.output(u'\03{lightred}WARNING\03{default} %s : Title is URL component (%s)' % (ref.link; ref.title))
if self.titleBlackList.search(ref.title):

Dispenser 00:43, 2 June 2008 (UTC)

Your bot[edit]

Absolutely wonderful idea! A small improvement I'd suggest, though: I've read in your talk page archive that you don't want to use citation templates in general, and I understand your reasons for this. However, if an article already uses citation templates in some or most of its references, wouldn't it be more sensible to have your bot convert bare references to a simple citation template reference instead of the non-template format? —Nightstallion 11:34, 1 June 2008 (UTC)

noreferences.py bug[edit]

# Is there an existing section where we can add the references tag?
for section in wikipedia.translate(self.site, referencesSections):
    sectionR = re.compile(r'\r\n=+ *%s *=+\r\n' % section)
 
[ ... ]
 
# Create a new section for the references tag
for section in wikipedia.translate(self.site, placeBeforeSections):
    # Find out where to place the new section
    sectionR = re.compile(r'\r\n(?P<ident>=+) *%s *=+\r\n' % section)

It should be

   sectionR = re.compile(r'\r\n=+ *%s *=+ *\r\n' % section)

and

   sectionR = re.compile(r'\r\n(?P<ident>=+) *%s *(?P=ident) *\r\n' % section)

since headers can trailing white space on the end. This might explain some of the weird things I've seen the script do.

Following up on the previous discussion I've tried to reduce the confusion in the documentation about the two scripts being the same. The official release of reflinks.py is running under the reflinks-svn and is almost identical to the svn version. The sources for the online tools are available at http://toolserver.org/~dispenser/resources/sources/. I've also integrated the script with link checker.

I would like to eventually merge the scripts in the future. Could there be an iterator function for generating links? I would like to match unlabeled bullet links or give people the option of converting to citation templates. The method that options are passed to the internals is "bulky" if 10 or more options are defined. Additionally, would it be possible with {{dead link}} to include the date parameter, I use non-language portable time.strftime("{{dead link|date=%B %Y}}") in the web script. I'm working out a way to fill in citation templates.

A passing thought would the pywikipedia community object to AWB like general fixes? — Dispenser 00:53, 8 June 2008 (UTC)

Sorry, sorry, sorry.
I'm very busy these days.
The regex change was a 10 second fix, I committed it on Sunday in r5541, but I sort of run out of time for longer concerns :/
Thanks for your involvement Dispenser, I *will* take a deeper look at your suggestions, but I just can't do it now.
NicDumZ ~ 20:53, 11 June 2008 (UTC)

Title regex bug[edit]

Output from http://www.tldp.org/LDP/sag/html/filesystems.html
 
<HTML
><HEAD
><TITLE
>Filesystems</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK

I have also seen

<head><title id="pagetitle">Title of page</title>

I have redone the method used to match titles in my version. I have a function which look for tags (title, h1, h2, h3, h4). If the first one is not found it then tried to use the second one till h2 before giving up.

Also, if you could implement transform function as a string cleanup function so I could run the function to clean the extra data (like authors, publisher, etc.). Thanks. — Dispenser 20:24, 25 July 2008 (UTC)

Reflinks PDF on Windows[edit]

Hey NicDumZ, I've been thinking about handling PDF files on Windows with your reflinks bot, and I was wondering if you could use the pdftotext and pdfinfo programs in the same way as you use the other program on the Unix versions (the same programs Zotero uses on Windows). Regards, --Dami (talk) 12:13, 30 July 2008 (UTC)

why not, but it looks a little bit intricated. Users have to build binaries themselves for each platform, which would mean that pywikipedia... would need to include these in its repositors ?
NicDumZ ~ 13:20, 9 August 2008 (UTC)

Your bot is whitening articles[edit]

Se here, for example. Or here. Patricio.lorente (talk) 21:42, 30 July 2008 (UTC)

Answered on es
NicDumZ ~ 13:20, 9 August 2008 (UTC)Hollstein (talk) 14:24, 27 October 2009 (UTC)

Bot report : Found duplicate references ![edit]

In the last revision I edited, I found duplicate named references, i.e. references sharing the same name, but not having the same content. Please check them, as I am not able to fix them automatically :)

  • "ancestor" :
    • {{cite book |last=Dawkins |first=Richard |authorlink=Richard Dawkins |title=The Ancestor's Tale |year=2004 |publisher=Houghton Mifflin |chapter=Chimpanzees}}
    • {{cite book |last=Dawkins |first=Richard |authorlink=Richard Dawkins |title=The Ancestor's Tale |year=2004 |publisher=Houghton Mifflin |chapter=Chimpanzees }}

DumZiBoT (talk) 09:52, 8 August 2008 (UTC)

This issue has been addressed. − Twas Now ( talkcontribse-mail ) 11:08, 8 August 2008 (UTC)
Looks like it was the same content, with a (marginally) different format. I think the 'bot needs to be upgraded. - UtherSRG (talk) 11:53, 8 August 2008 (UTC)
I just fixed that issue, thanks for the report ;)
NicDumZ ~ 13:16, 9 August 2008 (UTC)
Is it possible to tweak the bot to recognise when the only difference between two references is the accessdate format of the citation, not the content itself(for example here)? Euryalus (talk) 04:17, 9 August 2008 (UTC)
University of Southern California. "The Prophet of Islam - His Biography". Retrieved August 12 2006.  Check date values in: |accessdate= (help)
University of Southern California. "The Prophet of Islam - His Biography". Retrieved August 12, 2006. 
Sorry, this does not produce the same result. I very much understand the pain it can be, it looks really similar. But if I "teach" my bot to consider these as identical, which format should I keep ?
Because this is the very aim of the operation : using <ref name="blablah"/> where possible instead of copying <ref name="blablah">content</ref> into <ref name="blablah">content</ref>, and then, with time, alter only the copy to <ref name="blablah">content2</ref>, which leads to confusion... : find the common factor in the references and use it as a single point in the article.
If the format differs, the editors have to make a choice, not the bot...
NicDumZ ~ 13:16, 9 August 2008 (UTC)

Duplicate references: IMDb[edit]

Can you teach this bot about the IMDb any better? I see you've flagged some that have links to different subsections of the IMDb page about the same title, like the Trivia page and the Awards page. I think this will be done quite often and they will all quite often be labelled as just "IMDb". The main page for any title takes the form http://www.imdb.com/title/tt### (and for a person it takes the form http://www.imdb.com/title/nm###) but they can have various subsections after that in the URL like /trivia, /quotes, /plotsummary etc. There are also the various URLs that access the same system or shadow systems. The host name could be www.imdb.com, or just imdb.com, or us.imdb.com, or uk.imdb.com, or various others all listed on the page about the IMDb -- SteveCrook (talk) 10:13, 9 August 2008 (UTC)

Well, if the content is different, you should use two different reference names, obviously. "imdb awards", and "imbd trivia". Using "imdb" for two different contents is confusing, and that's what the bot has flagged. :)
NicDumZ ~ 10:44, 9 August 2008 (UTC)

Refs in nowiki[edit]

Based on this error, you may want the bot to ignore references that are encased on nowiki tags. Christopher Parham (talk) 23:08, 11 August 2008 (UTC)

Here's another instance of the bug, resulting in a cite error: [1]. — Emil J. (formerly EJ) 15:18, 12 August 2008 (UTC)

That ugly bug should have been fixed by now.
I'm afraid there's no perfect way to handle this, ignoring references in nowiki/html comments might lead to inconsistances later. (Imagine one reference in nowiki tags, and the second one without nowiki tag: which one will be modified ? Two choices : both, or the first in the text. And both choices can be erroneous, depending on the said text.) I might end up skipping pages which includes ref's in nowiki/html comments.
NicDumZ ~ 00:32, 13 August 2008 (UTC)

Id.[edit]

Hi, can you not change Id. references to an autoname option. Id. is just shorthand for a referral to the previous footnote, so the result is just wrong. Cheers! bd2412 T 01:26, 12 August 2008 (UTC)

wait wait wait. Can you provide me a diff please ? My bot is editing thousands of articles a day... NicDumZ ~ 01:46, 12 August 2008 (UTC)

Thanks again - in advance[edit]

I have added a few more references in Will Young's page, can they be tidied up please?Oyster24 (talk) 05:29, 12 August 2008 (UTC)

Done. But cant you do it yourself ? You added the references. You know which references you added, which ones have no titles, and so on.
Using the HTML title is actually a very bad habit. You, as an editor, are able to give any reference a meaningful, clear, and accurate title. Please do not rely on my robot.
NicDumZ ~ 05:34, 12 August 2008 (UTC)

Bot mistake[edit]

Your bot got the title wrong for http://www.chipknip.nl/ since that page uses a JavaScript redirect. Bad site design so it's understandable, but I though you might want to know. --Apoc2400 (talk) 12:00, 12 August 2008 (UTC)

The titles only containing "Redirect" are now blacklisted.
Thanks for the report :)
NicDumZ ~ 00:38, 13 August 2008 (UTC)

More thanks for DumZiBoT[edit]

Just had to say "Merci beaucoup" for your bot. Keep up the good work! Katr67 (talk) 17:46, 12 August 2008 (UTC)

flickr links in French?[edit]

Your bots edits on the LG_Shine_(U970) article have apparently been left unedited over several revisions. Unfortunately it was translated into an incomprehensible dead language ;o)

Example: LG U970 sur Flickr : partage de photos !

Can you train it to use English instead? --Opspin (talk) 20:42, 12 August 2008 (UTC)

I would make it use English, but... I can't reproduce this behavior. ?!
I suspect flickr to remember IP addresses associated to user accounts, (or something along these lines) to try to display the correct language.
(My browser and DumZiBoT have no cookies in common, of course.)
Any ideas are welcome, I have no obvious fix here to provide.
NicDumZ ~ 00:43, 13 August 2008 (UTC)

Charset problem[edit]

DumZiBoT obviously didn't handle properly the charset for the last three references of Pi (instrument). 128.214.205.65 (talk) 23:09, 12 August 2008 (UTC)

True. But there's no magic : No charset specified in the HTML, no charset specified in the HTTP header... My bot then tries divination, but heh, you do know that these things don't work.
No, honestly, here my firefox gets the charset wrong, and prints some gibberish. I dont think there's any way to get the charset right in this case.
And before you ask: no, I can't "ignore the pages when I can't get the charset", because the main concern in character encoding is that when you've decoded a page successfully, there is no way to tell if the decoded result is gibberish or meaningful content, unless you speak every Babel tongue.
NicDumZ ~ 00:25, 13 August 2008 (UTC)
I don't follow. The issue isn't whether or not the decoded text is gibberish (that could be intentional), but whether the charset used for decoding was known (from HTML or HTTP headers) or guessed. Can't the bot check if either of those two places actually define a charset? 128.214.205.65 (talk) 17:53, 13 August 2008 (UTC)
Well, DumZiBoT checks both HTML source and HTTP headers to find a charset. (In this case, there was no charset specified in any of these places)
For some specific domains (.ru, .su, .jp, .kr, .zh) it tries the national encoding (because a lot of these pages specify no encoding in the source /have no normalized HTTP server, but use the standard national encoding)
Then, if no charset from these 2 first steps can actually decode the text, it uses chardet to try to detect an appropriate charset. This library is a port from the Mozilla charset detector.
So, in our particular case, there is no charset in the HTML, no charset in the HTTP, and the chardet algorithm fails. Yes, it happens.
As I said, Firefox fails to display correctly the page if you don't specify yourself which encoding should be used (windows-874). Charset problems are really hard to handle, and it is not possible to detect accurately the encoding of 100% of the webpages, considering that a lot of them comply to no or little charset display standards. I don't know any way to detect the encoding of these pages...
NicDumZ ~ 02:43, 14 August 2008 (UTC)

Bizzare mistake[edit]

This bot is doing useful work, but I noticed a strange edit it made when it added a title for a reference to here: http://bridlewoodes.ocdsb.ca/NewsLetters/April2008.pdf from this article. The title it added was "I have started an intermediate drama club", a phrase which I can't find at all in the article or the document referenced. I normally wouldn't bother you with a simple error but this one made me curious. LK (t|c) 02:26, 13 August 2008 (UTC)

Please bother me with errors of my bots, that's the only way I can improve it :)
About this title, this is the title of the PDF document. I don't know about windows pdf softwares, but any linux pdf software shows "I have started an intermediate drama club" as the PDF title :)
NicDumZ ~ 02:28, 13 August 2008 (UTC)

Harbor Beach Light[edit]

Thanks for your efforts. Unfortunately, there is something seriously wrong with the form of the changes you made in the references, and the "Notes" section is now a mess. If you could help, that would be appreciated. Thanks. 7&6=thirteen (talk) 15:19, 13 August 2008 (UTC) Stan

The mess was caused by an unclosed ref tag in this edit, which has nothing to do with the bot. 128.214.205.65 (talk) 17:43, 13 August 2008 (UTC)
Thanks for the fix and thank you again. 7&6=thirteen (talk) 22:07, 13 August 2008 (UTC) Stan
The IP is right. It had nothing to do with DumZiBoT. Unclosed reference tag, only. NicDumZ ~ 02:22, 14 August 2008 (UTC)

New feature suggestion - remove duplicate references in same location[edit]

Looking at this edit [2] by the bot, I noticed that there are two references to the same source right next to each-other. Having the bot merge references helps highlight this, but it made me wonder if the bot could automatically identify such cases (where two copies of the same reference occur not separated by any article text) and remove the duplicate reference entirely, or flag it somehow to encourage human editors to fix it. Don't know how common this sort of error on wikipedia pages is (may be too rare to be worth bothering with), or how hard it would be to implement (and whether it would require further approval), but thought it worth at least tossing the idea out there. Thanks. Zodon (talk) 18:25, 13 August 2008 (UTC)

I really like improvement ideas. Thanks :)
I think that, however, finding such references is not this easy, or at least, trying to quickly find how to implement that check, no obvious idea came out. I am also unsure that the frequency of these cases justificates spending time on such a feature...
NicDumZ ~ 03:28, 14 August 2008 (UTC)

Concerns about DumZiBoT[edit]

Please see here. Thanks. --Kleinzach 01:26, 14 August 2008 (UTC)

Dumzibot = silly : see 42nd (East Lancashire) Division change 07:40 14th August 2008[edit]

This bot is wrecking links. e.g. it converted <ref>http://www.army.mod.uk/42bde/history.htm</ref> to <ref>[http://www.army.mod.uk/42bde/history.htm Welcome to the new British Army Website - British Army Website<!-- Bot generated title -->]</ref> : who wants a link announcing "Welcome to the new British Army Website - British Army Website" ?

The people who created the linked external website used a stupid generic page heading and dumzibot has just copied this stupid generic page heading into Wikipedia article. The link was better as it was before.

Creation of links that display meaningful text requires human skill and thought and should not be automated. So I will just revert this bot's edits where they wreck articles I maintain. There are too many of these stupid mass-damage bots. Rcbutcher (talk) 03:40, 14 August 2008 (UTC)

Revert ? sure, I don't care; but a better point would just be to add a title to this reference, you know :)
Yes, "Welcome to the new British Army Website - British Army Website" is a bad title, but in the other hand, no title is... no title. True, webmasters may use stupid /advert-like titles for their pages, and you don't want this in an article, but A) feel free to add a better title by yourself, DumZiBoT is, as you said, only a silly robot, trying to help in the extent of its capabilities B) a *lot* of other page titles just do fine for a reference link title (70% ? 90% ? )
DumZiBoT is not supposed to find the best title, it is supposed to find A title, because inserting a reference link without any title just doesn't help. It emphasizes the fact that a title is needed. You may not find it useful, but a lot of people just think that overall, DumZiBoT improves the quality of the references (inserting right titles, and making people improving not-so-good inserted titles)
180K edits so far : considering that DumZiBoT adds in average 3-5 titles per edit, it makes *a lot* of untitled references.
Stupid mass-damage bot ? please reconsider it...
NicDumZ ~ 03:58, 14 August 2008 (UTC)
And actually, did you check http://www.army.mod.uk/42bde/history.htm by yourself before coming here to complain ?
There's no content in this page, it just tells you that the pages are being migrated, probably moving them to another website structure, but that in the meantime the old pages are still available at www2.army.mod.uk
NicDumZ ~ 04:09, 14 August 2008 (UTC)
Sure I saw that the actual content page either no longer exists or has had its name changed. But the issue I'm pointing out here is that I don't think a robot is an appropriate tool to convert bare links to meaningful links. It needs a human. This bot did not and could not detect that the page didn't exist any more. All it did was change a link that was meaningful (it showed the website and page name) to something meaningless. Perhaps this bot6 should just report these bare links, and from then on a human could fix the link ? I'd be happy to contribute my time to help..Rcbutcher (talk) 04:29, 14 August 2008 (UTC)
You can run the tool on pages you come across, I could also have generated lists that pages my bot comes across. If you have any ideas leave them on my talk page. — Dispenser 16:09, 14 August 2008 (UTC) [Updated: 16:43, 14 August 2008 (UTC)]

Check browser settings[edit]

On this edit to Sharoe Green, your bot added a title of "Check browser settings". Is there any way your bot can detect an error of this sort and avoid it? --Dr Greg (talk) 17:18, 14 August 2008 (UTC)

Well, I have blacklisted a few days ago that specific title, that shouldn't happen anymore... ! NicDumZ ~ 03:43, 15 August 2008 (UTC)

DumZiBoT bare references[edit]

Hi, I had a few thoughts on your bot.

When DumZiBoT converts simply url links into references, I think it would be much more useful to to use something like {{cite web}} instead of the current approach. For example:

[http://www.example.net/longramblingtext]

currently becomes

<ref>[http://www.example.net/longramblingtext Example Web Site Title]</ref>

but, it should be something like

<ref>{{cite web|url=http://www.example.net|publication=www.example.net|title=Example Web Site Title|date=|dateaccessed=}}

While we do not want to show absurdly long urls in full, showing at least the domain, is good, in fact critical. Perhaps, you could truncate url text after some arbitrary point (maybe use ellipses). Since all this appears in the references section, and not the body of the article, it's ok to have a bit of detail in citations. Wikipedia needs to show where content came from, and web page titles often don't show that. Sometimes they are blatantly misleading. If all the references used in an article are from a particular domain of dubious reliability (like myspace), it's nice for the reader to be able to see that quickly, without having to read the Wiki markup, or hovering over each link.

Also, newbies who see a {{cite web}} tag are likely to fix and improve them, filling in other fields (we could use field names as place holders, so people are aware of them, such as the "date" field), that a bot could never fill-in. --Rob (talk) 23:02, 15 August 2008 (UTC)

As discussed at WP:BON, using {{cite web}} for this seems overkill. However, I do think it would be an improvement (and would allay many of the concerns noted at BON) if the bot were to leave the hostname part of the link visible in the page text, so that e.g. the example URL above would become
<ref>[http://www.example.net/longramblingtext Example Web Site Title<!-- Bot generated title -->], www.example.net</ref>
This provides some information about the publisher of the information, and would also go a long way towards fixing the problem of misleadingly titled web pages mentioned at BON. —Ilmari Karonen (talk) 19:52, 17 August 2008 (UTC)
Rob/Ilmari Karonen; I wonder where we are with this now? Has NicDumZ responded? If so what was the result? --Kleinzach 04:44, 29 August 2008 (UTC)
I agree with Rob. Using {{cite web}} definitely makes more sense than this ugly stuff this bot produces right now. NicDumZ's explanation why he thinks {{cite web}} isn't appropriated does not convince me at all. --bender235 (talk) 16:58, 18 March 2009 (UTC)
BTW: A bot could also add something like accessdate=2009-03-18, which actually adds useful information. --bender235 (talk) 17:01, 18 March 2009 (UTC)

DumZiBoT BPL Page revision[edit]

Dear DumZi, please accept this expression of my most distinguished sentiments, but your bot ran wild and free on the Barton Paul Levenson page and scattered or deleted some important references. While the effort was noble, and I hold you in the highest regard, I had to roll back to even understand where we started. Your humble servant: Botendaddy (talk) 07:57, 17 August 2008 (UTC)

hehe :)
Damn wild beast, that DumZiBoT!
But it did not delete any important references. It uses named references : copying, for example, 5 times "Art & Prose Magazine, October 2007, p. 40. #61 Showcase Writer, Interview with B.P. Levenson" is not necessary; you can simple name the first occurence of the reference and call for the name when you want to reuse it. That's what DumZiBoT did: it just added a name to that reference (autogenerated2), to avoid repeating some content :)
I don't really understand about the "scattered" part; the Notes paragraph looks nice here ?!
NicDumZ ~ 08:47, 17 August 2008 (UTC)

Bot [ looooooooooooooong paaaaaaaauuuuuuuuuse] generated title[edit]

I keep finding the phrase "Bot generated title". Is it impossible for this bot to write "Bot-generated title" with a proper hyphen? Michael Hardy (talk) 14:38, 18 August 2008 (UTC)

You should've spoken up at the BRfA as now changing it only means complicate text matching. So I'd wouldn't see much of a point changing it. — Dispenser 03:16, 20 August 2008 (UTC)

Reflinks.py problem with autogenerated names for refs[edit]

Dear NicDumZ! I find your recent improvement of reflinks.py to combine identical citations quite useful, but a bit buggy. It seems to rename named refs for no reason (instead of using the user-given name for them at all occurrences), somehow messing up citations in the form of <ref name="a">Text</ref><ref name="a">_</ref> which works correctly, but after a rename it does not: [3][4]. An other problem is that for some reason the rendering of refs doesn't really work on long pages after an edit with reflinks.py: [5][6]. I hope these can be fixed, otherwise it will get quite tiresome to hand-check every single edit of the bot on the actual wiki to check for rendering errors of MediaWiki. --Dami (talk) 16:24, 18 August 2008 (UTC)

The first diff is not my fault :p The ref "Fangio nyerte meg a világbajnoki címet 1951-ben" was not closed :p
Anyway, I appreciate that you're using currently reflinks.py, but I have definitely changed it, and I am not running the svn version anymore.
It is not anymore resolving duplicate references, it is flagging them on the talk pages of articles, because fixing them automatically involved too many errors :)
Some newly blacklisted titles appeared, + bugfixes, introducing equivalence between very similar references, and so on.
I need so more time to clean up my code before committing, but I'm not sure you should trust the current SVN version right now, sorry to say this...
NicDumZ ~ 16:34, 18 August 2008 (UTC)

shhh[edit]

quiet down —Preceding unsigned comment added by 124.182.209.206 (talk) 08:16, 19 August 2008 (UTC)

Markup tags included in bot-generated title[edit]

I noticed that this edit (at line 58) made by the bot included a lot of meta keywords in the title, which were then added as clear text in the footnote. I've reverted that particular portion of the edit. You might want to update the bot to strip out such mark-up from a title before adding it. Keep up the otherwise good work. TJRC (talk) 19:08, 19 August 2008 (UTC)

Who's ever played Eduard Friedrich Mörike's opera "Eduard auf dem Seil" ?[edit]

Hi, I'm looking for Eduard Friedrich Mörike's opera "Eduard auf dem Seil" (according to http://en.wikipedia.org/wiki/Silpelit), but I can find any information, who ever has played it. If do you know any link to the information about performance this opera please drop me an e-mail at fazoo@o2.pl, because it seems till now, that nobody's played it ever. It's really important for me so I'd be thankful for any information. —Preceding unsigned comment added by 83.9.92.72 (talk) 06:11, 25 August 2008 (UTC)

Category redirects[edit]

I heard you are working on new code for MediaWiki that will make category redirects work better. Then you might be interested in the discussions we are currently having over at Template talk:Category redirect#New categorization ideas and the section below that named "Soon obsolete?".

--David Göthberg (talk) 03:15, 26 August 2008 (UTC)

Bot edit when references are already broken[edit]

I just happened upon this edit by the bot. As you can see, the references were already broken beforehand, and the bot left them broken in a slightly different (and as it happens uglier, but that's neither here nor there) way. Would it be possible for the bot to pick up on this sort of thing, and drop a warning somewhere or add a cleanup template? I don't know what kind of programming that would require, but it would a big difference (and, from your point of view, avoid a situation where someone might wrongly think the bot has screwed up). Just a thought, Chick Bowen 05:27, 28 August 2008 (UTC)

DZB[edit]

Timeline_of_the_2008_Pacific_typhoon_season DZB titled http://www.webcitation.org/ URls which it should probaably skip in its balcklist. Rich Farmbrough, 21:13 7 September 2008 (GMT).

reflinks is broken and kills content[edit]

Please stop the tool ASAP: It kills content by deleting whole paragraphs:

And please restore each and every article, that was touched by the tool. --h-stt !? 10:54, 10 September 2008 (UTC)

NicDumZ isn't the tool's owner, I am. And I noticed the problem last night and fixed. — Dispenser 14:28, 10 September 2008 (UTC)

Image copyright problem with Image:INPG.jpg[edit]

Thanks for uploading Image:INPG.jpg. You've indicated that the image is being used under a claim of fair use, but you have not provided an adequate explanation for why it meets Wikipedia's requirements for such images. In particular, for each page the image is used on, the image must have an explanation linking to that page which explains why it needs to be used on that page. Can you please check

  • That there is a non-free use rationale on the image's description page for each article the image is used in.
  • That every article it is used on is linked to from its description page.

This is an automated notice by FairuseBot. For assistance on the image use policy, see Wikipedia:Media copyright questions. --FairuseBot (talk) 00:26, 16 September 2008 (UTC)

babtitles update[edit]

# Examples:
#   Radio-Locator: Radio Station Finder: Search Results
#   This page has moved!
#   WebCite query result
#   NY Times Advertisement
#   Access Denied
#   Nothing found for  S Subspecies
#   E! Online - Sorry, the page you requested is not available.
#   404&nbsp;-&nbsp;MuslimWays&nbsp;-&nbsp;MuslimWays
#   The resource cannot be found.
#   Yahoo! - 404 Not Found
#   FOXSports.com - Page Not Found
#   ESPN - Sitemap
#   Error!
#   N-Gage | Website Error
#   Error Occurred While Processing Request
#   Missing - New York Post
#   The article you&#39;ve requested is no longer available.
#   The United States Army Error Page
globalbadtitles = r"""
# is
(test|help|JSTOR.[ ]*Accessing[ ]*JSTOR
# starts with
    |\W*(
            register
            |registration
            |(sign|log)[ \-]*(in|on|up)
            |subscribe
            |(untitled|new)[ ]*(document|page)
            |my\b
            |your\b
            |404\b
        ).*
# anywhere
    |.*(404|page|file|story|resource).*(not?([ ]*(be|longer)?)?[ ]*(found|available)|moved).*
# bad words
    |.*\b(error|cookie|advertisement|cart|checkout)[s]?\b.*
# ends with
    |.*(
            register
            |registration
            |(sign|log)[ \-]*(in|on|up)
            |subscribe
            |result[s]?
            |search
            |untitle[d]?
            |account[s]?
        )\W*
)
"""
badtitles = { 'en':'',
              'fr': '.*(404|page|site).*en[ ]+travaux.*',
              'es': '.*sitio.*no[ ]+disponible.*'
            }

It a bit broader than original time so more false positives, but that's probably good. — Dispenser 23:18, 21 September 2008 (UTC)

When adding bad title detection to Checklinks I miss configured such to get a list of titles that weren't detected. I've posted a titles that weren't by the method implemented. By the way the svn version is broken. — Dispenser 05:11, 7 October 2008 (UTC)
Thank you Dispenser. I do use your updated version of badtitles in my local version; but I'm not committing it since it contains changes related to the second task adding talkpage warnings which I don't consider good enough to commit. In fact I don't have any mechanism to avoid posting again the same warning on the talk page.... yet, I need to spend some time on it. And I did notice this morning that siebrand broke the latest version, yeah :(
I'm not really in a hurry, since the dumps are still halted.
I'm wondering who's using the SVN version, actually...!
NicDumZ ~ 06:13, 7 October 2008 (UTC)
Wait. Unless you need it repaired and updated quickly ? NicDumZ ~ 06:14, 7 October 2008 (UTC)
No. I just noticed it from when I (recently) updated to svn. I've noticed the duplicate detector doesn't like identically named references that share similar content. One case was a formatting changing [7](ref:Eree), another case was an extra word was removed. Perhaps the ratio() from SequenceMatcher would be helpful here?
Post Script: I use three versions now, SVN, SVN with a return string option, and webreflinks. Would be nice if I could get ride of the second. — Dispenser 07:42, 7 October 2008 (UTC)
Err, SequenceMatcher isn't a good idea since some reference differ only by a page number. A list of character to be ignore in the comparision would be better. I've thought up and partially created some definitions for better ref naming. Using regexes for each language with the current method as a fallback. An inferior implementation now works with my tools. — Dispenser 00:23, 13 October 2008 (UTC)
nameBase = {'en':[r'\|\s*last\s*=(?P<base>\w+)',
                # Short footnote
                r'^(?:\[\[#[^][]+\|)?(?P<base>\w+)',
                r'.*\w+://[a-z0-9\-\.]*?(?P<base>[a-z0-9\-]+)\.[a-z\.]{2,6}\b.*',],
                }
nameSufix= {'en':[r'\|\s*page\s*=(?P<uid>\d+)',
                r'(?:pg?.?|page) +(?P<uid>\d{2,})',
                r'\|\s*year\s*=(?P<uid>\d{4})',
                r'\|\s*date\s*=[^{|}]*?(?P<uid>\d{4})',
                r'\b(?P<uid>1\d{3}|200[0-5])\b',],
                }
refbasename = re.search(nameBase['en'][0], refText)

Long titles are ridiculous[edit]

<ref name="nfo">[http://www.nfo.net/usa/p1.html#GPaxton Biographies of Oran 'Hot Lips' Page Orch., Walter Page and his Blue Devils Orch., Tiny Parham Orch., Tony Pastor Orch., Eddie Paul and The Paramount Orch., George Paxton Orch., Santo 'Peck' Pecora and his Orch., Paul Pendarvis Orch, Raymond Paige Orch., Louis Panico Orch .,Anthony 'Tony' Parenti Orch., Ray Pearl Orch , Joe Pica Orch., Emile Petti and his Savoy Plaza Orch., Jack Pettis and his Pets., Picou's Independence Band, Teddy Phillips Orch., Merle Pitt & His 5 Shades of Blue Orch., Ben Pollack Orch., Tito Puente Orch., Perez Prado Orch., Paradise Club Orch., The Palace Gardens Orchestra, Cotton Club Orchestra, Andy Preer and His Cotton Club Orchestra, Prince's Orch., Arthur Pryor Orch<!-- Bot generated title -->]</ref>

Enough said? Stevage 04:36, 6 October 2008 (UTC)

Where, when was it ?
This has been fixed a while ago and should not happen again, /me thinks. NicDumZ ~ 05:00, 6 October 2008 (UTC)
George Paxton, 13 feb 2008. You see the dangers of massive widespread changes with buggy software...Stevage 01:12, 7 October 2008 (UTC)
So that's old, and fixed since, as I said. NicDumZ ~ 01:32, 7 October 2008 (UTC)
"Fixed" in the sense "it won't happen again". Not "fixed" in the sense there are still crappy links like this all over the place. Stevage 02:09, 7 October 2008 (UTC)
Better than not fixed at all. If you have a way to detect all the long titles, go ahead, code is open source. NicDumZ ~ 02:18, 7 October 2008 (UTC)
\[\w://[^][<>\s"]+ [^]]{200,}<!-- Bot generated title -->\], but that's besides the point. The BRfA was open for a month and a half, in which I was the only one pressing erroneous title detection. And months later we get people complain that the work that we do isn't as good as a human. — Dispenser 02:44, 7 October 2008 (UTC)
I can't speak for the BRfA because I didn't know anything about it. But IMHO a bot of this type should be written and operate using the maximum caution: only making changes where it is certain that the change is better than what preceded it. Having maximum and minimum lengths, not breaking ref tags, etc etc, are all part of that. The code being "open source" is spurious - no one but the bot's owner can change the way it behaves. Stevage 03:19, 8 October 2008 (UTC)
Blablabla... Come on, are you making all this fuss because in some rare cases, a while ago, my bot inserted bad titles ? An error that since then has been corrected, not to mention all the improvements that have been made to the script since February ? And yes, Open Source means anyone can read the source and suggest changes to it. Even better; the latest version is usually here, if you have any patch to suggest, please go on. NicDumZ ~ 03:26, 8 October 2008 (UTC)
(ident) You know, Dispenser is right, it's quite easy to detect long titles. I'm gonna use the soon to come dump to catch if any of these titles are too long and truncate them. NicDumZ ~ 03:42, 8 October 2008 (UTC)
Cool. Can I also suggest changing "Bot generated title" to "Automatically generated title" (or failing that, "Bot-generated title".) Stevage 08:13, 8 October 2008 (UTC)
That'll break my things for detecting those titles. If you have a suggestion on how to get people to remove the text after confirming the title is (likely) correct, I'm willing to listen. — Dispenser 14:13, 8 October 2008 (UTC)
If by "break" you mean, "cause you to tweak the regexp", then yes. Anyway how about "-- Automatic title - please verify. --> Stevage 03:38, 9 October 2008 (UTC)

Bot request[edit]

# Peer Reviewer script had for a short time convert URL into {{Cite web}} but used the URL as title
text = re.sub(r'\{\{cite web *\|url=http://(?P<title>[^{|}]+) *\|title=(?P=title)<!-- INSERT TITLE -->(\| *accessdate *= *[^{|}]+)?\}\}', r'http://\g<title>', text, re.I)

Could you run DumZiBot on these to add titles? I've generated a list from the May dump. Also you might want to have a look at Possible bug with duplicate ref content. — Dispenser 01:13, 20 October 2008 (UTC)

Bot stopped[edit]

What happened? Is this bot out of operation???--Kozuch (talk) 23:33, 1 December 2008 (UTC)

Not really. I'm quite busy at the moment, have troubles to catch up. There's quite a lot of code to write before DumZiBoT can start again, and I can't find the time to do it. Moreover, from Asia, where I've been since a few months, connectivity to Wikimedia sites is, I have to say, CRAPPY! Running a bot on these conditions is not really possible.
Hopefully I'll be back in Europe, with some free time; I really hope that I'll be able to restart the little tool soon.
NicDumZ ~ 04:47, 7 December 2008 (UTC)

Hibben ref.[edit]

Hi. One of Your refs. on the Hibben page is to "Nature (Society for American Archaeology)." I am confused. First, the doi link gives me an error; second, The Nature I know is not put out by the SAA (which puts out American Antiquity and the like).Kdammers (talk) 02:29, 1 January 2009 (UTC)

Hi[edit]

Thought this might interest you: [8]. Hesperian 04:00, 3 January 2009 (UTC)

Continuous bot preventing linkrot[edit]

There are sites which take down links in less the time between datebase dump; Google's AP feed and those listed at User:Peteforsyth/O-vanish are just some examples. Would it be possible to have a bot monitor the external links IRC channel matching them against regexes or domains on a subpage and have DumZiBoT process those pages at the end of the day? — Dispenser 23:13, 7 January 2009 (UTC)

Reflinks.py on it.wiki[edit]

Hi, NicDumZ!!! I'm Màrço 27, active principally on Italian Wikipedia (sorry for my English ^_^). We would like to use your script "Reflinks.py". Can you add this parameters on your script for adapt it also it.wiki??? I write here the parameters:

msg = 'it':u'Bot: Correggo collegamenti esterni senza titolo nelle note (si veda [[:en:User:DumZiBoT/refLinks|la documentazione]])',

deadLinkTag = 'it':u'{{Link non attivo|%s}}',
              
comment = 'it':u'Titolo generato da un bot',

Thanks for all!!! Have a nice day ^_^ !!!--Màrço 27 (msg) on it.wiki: user pagetalk page 13:20, 1 February 2009 (UTC)

Sorry, another thing: the bad titles are the titles of the page not found??? Are they required? Thanks for all ^_^ !!!--Màrço 27 (msg) on it.wiki: user pagetalk page 13:48, 1 February 2009 (UTC)
Sorted out on IRC. Warned that... reflinks.py is far from being up-to-date and bugless, mostly because of my selective inactivity of the last months. :/ NicDumZ ~ 15:57, 4 February 2009 (UTC)

Category redirects[edit]

Since it looks like your new category redirect code has finally gone live, it might be helpful if you could comment in this discussion to clear up any confusion about how it actually works. Thanks. --R'n'B (call me Russ) 17:28, 19 February 2009 (UTC)

getVersionHistory[edit]

Your regex don't work very well with the autocomment :) A few days ago I wrote this regex and it seems work

<li class=".*?">(?:\(.*?\)\s)\(.*?\).*?<a href=.*?([0-9]*)" title=".*?">([^<]*)</a> <span class=\'history-user\'><a [^>]*?>([^<]*?)</a>.*?</span></span>(?: <span class="minor">m</span>|) <span class="history-size">.*?</span>(?: <span class=[\'"]comment[\'"]>\((?:<span class="autocomment">|)(.*?)(?:</span>|)\)</span>)?(?: \(<span class="mw-history-undo">.*?</span>\)|) </li>'

Morover the param &go=first (in reverse order) is obsolete and don't work. You can fix with &dir=pre. I write to you because I haven't commit access ;) --Mauro742 (talk) 11:02, 21 February 2009 (UTC) If you want to write me go here

Mediawiki has changed again :) This regex works:
<li class=".*?">\((?:\w*|<a[^<]*</a>)\)\s\((?:\w*|<a[^<]*</a>)\).*?<a href=".*?([0-9]*)" title=".*?">([^<]*)</a> <span class=\'history-user\'><a [^>]*?>([^<]*?)</a>.*?</span></span>(?: <span class="minor">m</span>|)(?: <span class="history-size">.*?</span>|)(?: <span class=[\'"]comment[\'"]>\((?:<span class="autocomment">|)(.*?)(?:</span>|)\)</span>)?(?: \(<span class="mw-history-undo">.*?</span>\)|)\s*</li>

Another edit todo:

Replace

if reverseOrder:
   if len(self._versionhistoryearliest) >= revCount:
      path += '&dir=prev'
   else:
      path += '&go=first'

with

if reverseOrder:
   path += '&dir=prev'

because "go" variable is deprecated (See here) --Mauro742 (Talk) 10:59, 27 February 2009 (UTC)

Talk:2007 in Iraq[edit]

I have started a discussion at Talk:2007 in Iraq that I thought you might want to weigh in on. Thanks --jhanCRUSH 00:59, 29 March 2009 (UTC)

DumZiBoT on pl.wiki[edit]

flag, of course, granted :) Masti (talk) 19:34, 20 April 2009 (UTC)

  • and unblocked Masti (talk) 19:36, 20 April 2009 (UTC)

MauritsBot on nl.wiki[edit]

on nl.wikipedia there have been several cases in which a bot called MauritsBot, referring to en:User:DumZiBoT/refLinks in the Edit summary, added <references /> when a similar thing was already present.

The bot added == Voetnoten == <references />. And case is that {{referenties}} and {{appendix}} are templates that have the heading already included, together with some other features.

Problem is that the bot added == Voetnoten == <references /> still, even though one of those templates was already used. Ofcourse, it's not strange, since because of the template <references /> is no longer literally present in the article.

If it doesn't stop quick it may or may not get the bot blocked on nl.wikipedia.org ....

These are the templates that are most common:

  • {{bron|bronvermelding={{references}}}} = Heading and <references />
  • {{references}} = Just <references /> with option to change fontsize
  • {{referenties}} = Small bolded heading, <references />, and a box around
  • {{appendix}} = Bigger more wiki-like heading, <references />, and box around, and lots of other parameters and options.

Just thought I'd let you know :) . Greetings, Krinkle (talk) 08:08, 22 April 2009 (UTC)

Moved from here on 00:18, 23 April 2009 (UTC)
Great ! Thanks for the quick response on my home wiki. If I find anything else I'll report it at the same spot (I wasn't sure about whether to report it at your Talk, or somewhere else).
I've got a question though: you mentioned "The owner is responsible for adapting the script completely". Does this mean that, even though you have updated the pywiki-thing, the owner's bot will not automaticly use the new version ? (in other words: Is it neccerary/recommended to inform the owner of MaurtitsBot aswell in order to stop this strange adding of references, or does the bot automaticly update itself with the new pywiki-information  ?)
Greetings,
Krinkle (talk) 13:45, 22 April 2009 (UTC)
I use a modified version of the script, wherein most of the templates are registered. This one (relatively recent, August 2008) was missing, I just fixed it. Regards, --Maurits (talk) 01:24, 24 April 2009 (UTC)

List of Latin digraphs‎[edit]

Please don't interwiki List of Latin digraphs‎ unless the articles cover the same topic. kwami (talk) 08:51, 24 April 2009 (UTC)

Stop adding inappropriate links, please. kwami (talk) 22:17, 28 April 2009 (UTC)
Hello. DumZiBoT is a robot. I don't voluntarily add those, and the bot cannot know that it is wrong. The links are added because other pages have been linking incorrectly to List of Latin digraphs‎: here, pl:Rz (dwuznak) was incorrectly linked to the English list.. The right way to fix it, is to change the remote page, as you did. Thanks.
NicDumZ ~ 04:38, 29 April 2009 (UTC)
It's doing it again. One bot will update links to redirects, of which there are over a hundred, and then your bot adds those in as interwikis. And when I do go in and remove the english links in the source page, another bot will revert me, and it starts all over again. Don't you have an exception list? Or would it be better to indefinitely protect the article, and ask that you not accord your bot admin status? kwami (talk) 08:49, 5 May 2009 (UTC)

interlanguege ink / es:Talon Karrde[edit]

Hello, NicDumZ and your bot. Thank you for your contributions. Please be informed that DumZiBoT adds interlanguage link to "es:Talon Karrde" on "list of Star Wars characters" pages[9]. "en:Talon Karrde" is a redirect to the list. Best regards, --Kurihaya (talk) 04:22, 27 April 2009 (UTC)

どうもありがと ^_^ !
Thanks for the report! I corrected the error on all affected wikis. I then added es:Anexo:Personajes de Star Wars to the English page, which should solve the issue.
NicDumZ ~ 04:39, 27 April 2009 (UTC)
Merci beaucoup. Thank you for your prompt action. Every bot will appriciate the link. I hope you enjoy Sakura, ja:サクラ, in Tokyo. --Kurihaya (talk) 05:01, 27 April 2009 (UTC)

DumZiBoT[edit]

Hi, DumZiBoT is approved in zh wiki. best regards. --Mywood (talk) 11:06, 27 April 2009 (UTC)

Dumb bot... ;-)[edit]

Fotolito (ES & PT) is not the same as cel (EN). Sure, they are both transparent sheets, but cels are used in animation, "fotolito" for printing. --Janke | Talk 16:27, 30 April 2009 (UTC)

Hello! :)
My bot is not duuuuuuummmmm... :p
Actually, pt:Fotolito links (wrongly?) to cel and de:Cel, and DumZiBoT just took the links from that page, and propagated them on en/fr/it/es/ja/de.
Oh well. I went around those wikis, and removed the incorrect links. (because only removing it on en: doesnt solve the problem, the next bot that'll walk by will insert the same links :p )
Thanks for the helpful report!
NicDumZ ~ 04:22, 1 May 2009 (UTC)

2090s and tk:2099[edit]

Your bot is adding, among other things, tk:2099 to 2090s. This is incorrect, as tk:2090 also maps to 2090s. It's possible the fix needs to be made on tk (and mk:2065), rather than here, but I thought I'd report it to you. Bots are DumZiBoT (talk · contribs), GrouchoBot (talk · contribs), and Xqbot (talk · contribs). — Arthur Rubin (talk) 00:34, 2 May 2009 (UTC)

mmm. I don't really know what to do here. tk has no 2090s article, and because of this, I believe that linking from the 10 tk:209x articles to 2090s. It's... correct, right?
So the bots come from any tk:209x article, see that they're linked to 2090s... look at the interwikis on 2090s; and because there is no interwiki to tk:, they add the tk: link on the English article.
Maybe you can add a link from 2090s to tk:2090? I know that it is not exactly correct, but... I see no perfect solution here, because we're just combining two correct behaviors, and only the combination is wrong :s
NicDumZ ~ 10:50, 8 May 2009 (UTC)
Hey, forget this, I just saw that tk:2090ýý had been created. Great ;) NicDumZ ~ 10:51, 8 May 2009 (UTC)

Irrelevant links[edit]

Hi, your bot (and other interwiki bots) keeps adding irrelevant links to one article. Do you have any advice for this not to happen? --Eusebius (talk) 09:16, 8 May 2009 (UTC)

Hello!
LIG is a redirect to the Lab' page on en/es/fr. I haven't looked carefully at this, but I suppose that at a point LIG has been the main page, and has been linked wrongly from an acronym page on another wiki, and that the situation changed after this.
Oh well, whatever. This should be fixed now, as I removed all the wrong interwikis by hand.
Thanks for the nice report, by the way :)
NicDumZ ~ 10:05, 8 May 2009 (UTC)
I guess I should have done that myself. I'm not very aware of how interwiki bots work. Thanks anyway! --Eusebius (talk) 11:46, 8 May 2009 (UTC)

ceaselessly screwing things up[edit]

Now we're getting the digraph kp in one wiki linked to oe in another wiki 'cuz their wiki-en links have both been redirected to list of digraphs, and the list is being back-linked to both kp and oe. You sure you can't add an exceptions list to this? kwami (talk) 07:11, 10 May 2009 (UTC)

Hi. Please be a bit more precise in your report :) What happened, where, when? Is my bot responsible? Can you show me a wrong diff? Do you have any idea of why it happened?
Thanks.
NicDumZ ~ 09:14, 10 May 2009 (UTC)

Adding categories[edit]

Why does your bot keep adding categories to Trombiculidae, that do not have to do with the family, like aoûtat, this is a genus not a family. Bugboy52.4 (talk) 15:03, 10 May 2009 (UTC)

Stop! Bugboy52.4 (talk) 01:59, 24 May 2009 (UTC)

Wrong interwiki[edit]

Please stop doing such edits: http://en.wikipedia.org/w/index.php?title=New_Oxford_American_Dictionary&diff=290139005&oldid=287715278 --Jakas1 (talk) 14:34, 20 May 2009 (UTC)

Hello, thanks for your report.
I corrected the links on ja, nl and lt, this should not happen anymore. NicDumZ ~ 06:03, 22 May 2009 (UTC)

Wrong portuguese interwiki[edit]

Your bot is doing wrong edits on the page Generation Z putting interwiki not related to this article. For example, the version in Portuguese is related to the article of a brazilian ISP. Viniciustlc (talk) 09:47, 24 May 2009 (UTC)

Thanks for your report. It happened because some pt: users linked to the English Internet Generation article (former name of the ISP), which is a redirect to Generation Z
I went on pt/ja/lt/en and fixed manually the links.
NicDumZ ~ 13:54, 24 May 2009 (UTC)
Thanks for fixing it! I was trying to fix, but the bot was redoing the interwiki.Viniciustlc (talk) 23:01, 24 May 2009 (UTC)

Gzip support in reflinks[edit]

The Wayback Machine now gzips its output by default. The following code will automatically ungzip

44a46
> import gzip, StringIO
523c529,533
<                     linkedpagetext = f.read(1000000)
---
>                     if headers.get('Content-Encoding') in ('gzip', 'x-gzip'):
>                          data = gzip.GzipFile('', 'rb', 9, StringIO.StringIO(f.read()))
>                          linkedpagetext = data.read(1000000)
>                     else:
>                         linkedpagetext = f.read(1000000)

Also https://www.glasgow.gov.uk/en/News/Archives/2006/September/GlasgowLahoreTwinning.htm closes its title like as </title >. — Dispenser 19:52, 6 July 2009 (UTC)

Applied on r7282. Thanks, and sorry for taking so long.
It has a small drawback: we might end up downloading big compressed pages with f.read(). And we can't just truncate the compressed stream.
But that should be a minor issue.
Thanks again,
NicDumZ ~ 10:39, 21 September 2009 (UTC)

AfD nomination of Battle of Bir-el Harmat[edit]

Ambox warning pn.svg

An editor has nominated one or more articles which you have created or worked on, for deletion. The nominated article is Battle of Bir-el Harmat. We appreciate your contributions, but the nominator doesn't believe that the article satisfies Wikipedia's criteria for inclusion and has explained why in his/her nomination (see also "What Wikipedia is not").

Your opinions on whether the article meets inclusion criteria and what should be done with the article are welcome; please participate in the discussion(s) by adding your comments to Wikipedia:Articles for deletion/Battle of Bir-el Harmat. Please be sure to sign your comments with four tildes (~~~~).

You may also edit the article during the discussion to improve it but should not remove the articles for deletion template from the top of the article; such removal will not end the deletion debate.

Please note: This is an automatic notification by a bot. I have nothing to do with this article or the deletion nomination, and can't do anything about it. --Erwin85Bot (talk) 01:12, 28 July 2009 (UTC)

Bugfix[edit]

Hi! Can you apply this bugfix? I've a question. How can I have the commit access? I fixed some bugs in past. I could help you in development. --Mauro742 (Talk) 09:57, 18 September 2009 (UTC)

Your Bot[edit]

Hi, I saw that your bot hasn't been making any edits for some time now. I was interested in you running your reflinks bot when the new dump comes out. Thanks. 72.171.0.139 (talk) 16:18, 19 September 2009 (UTC)

ah? Who are you?
The main reason why the bot stopped is because the scripts needed work, adaptation, to ease people criticizing the tool. I don't personally have time to do this. I actually don't even have time to work on pywikipedia core anymore, or very little.
So that's a no for now, in the actual state of things. However if you want to submit patches to pywikipedia to solve those little problems, I'd be very grateful. If you are skilled enough to maintain this tool, then of course you'll have my support in running a bot replacing DumZiBoT.
NicDumZ ~ 10:21, 21 September 2009 (UTC)
Actually, I was interested in the bit of your code that tags dead links. Do you think you could strip down the bot to ONLY tag dead links. That way I could avoid the angry mob. Tim1357 (talk) 01:57, 23 September 2009 (UTC)
You probably want to be talking to me, since I helped creating the code based off experience on working on Checklinks. — Dispenser 02:41, 23 September 2009 (UTC)
Sure. I want to have a bot that tags dead links as it finds them. I know ref-links does this, and i tried isolating that function, but I am a novice programer and I ran into problems with 'continue'. If you want to see what i have, i could email the script to you. If you think it would be faster if you did it yourself, that would be fine too. Tim1357 (talk) 21:50, 23 September 2009 (UTC)
Three years later, do you have some time to do this now? Thanks!   — Jeff G. ツ (talk) 19:24, 23 August 2012 (UTC)

Wrong interwiki spanish-german Reelección presidencial[edit]

Hello, In spanish Reelección presidencial does not correspond with Amtszeit in german. Consider to correct it, Thank you.es:Ildefonk. --Ildefonk (talk) 11:57, 10 November 2009 (UTC)

BOT generates foreign language references[edit]

Are you aware that your BOT generates references in what would be foreign language text for the English Wikipedia? I am sure this is a global repercussion of your BOT, but the example that I am referring to is the Virgin of Candelaria article. Those references are essentially unusable to those users on the English Wikipedia that are not multi-lingual. I don't oft-hand know if Wikipedia's rules are to delete foreign language references or not, however they give a false impression to article writers and readers that there are actual references supporting the article information, as no one (or a tiny subset) of people can read the cited reference. In fact, the references used may not be referring to the specific topic at hand at all, but no one can tell to verify or dispute this. Hence, the information may be true, or then again it may not be true. Isn't that misappropriating the concept of information democracy that Wikipedia was purportedly created for? Stevenmitchell (talk) 14:18, 27 November 2009 (UTC)

haha. Chill down.
My bot is not generating content, but merely formatting what other editors have inserted in the article. Refer to the initial editors if you feel excluded from the "information democracy", not to me or my bot. As for my personal idea on the topic, I would rather have a reference in a foreign language than no reference at all. If you feel excluded, add an English reference instead, or learn the other language ;)
My bot (and me) are currently inactive anyway.
NicDumZ ~ 02:26, 30 November 2009 (UTC)

Getting archive urls with a bot[edit]

Hi NicDumZ, I left a question/request at Wikipedia:Bot requests/Archive 32#Getting links from web.archive.org and, since you seem to use reflinks.py with your bot, I thought you might have some idea about how to retrieve information from external sites like this. If you have some time, would you be able to take a look and let me know if you would be able to offer pointers? Thanks, rʨanaɢ talk/contribs 01:44, 3 December 2009 (UTC)

Nick dumz, i created a bot of sorts that I am trying to get to do this. It hasn't worked yet, but I was hoping that you would give me some suggestions. For example, I see that you say "The hardest part would be determining if the link is dead or not". My solution to this would be to parse the external links before-hand and then run the bot like this
try:
 urllib2.urlopen(url).read()
except  urllib2.HTTPError, e:
 if e.code == 410 or (e.code == 404 and url in textfilewithdeadlinks)

which would give the link two trys before assuming that it is dead. Tim1357 (talk) 23:04, 7 December 2009 (UTC)

running a bot from the toolserver[edit]

A while back, when you helped me write this bot, you also very kindly volunteered to run it periodically from the toolserver. Are you still willing to do so? How would you do so: by reading the bot code each time or by using the current code unless notified to change? And would the edits to wikt:Wiktionary:Entries needing topical attention be attributed to your enwikt account? Thanks.—msh210 18:08, 4 January 2010 (UTC)

Deleting interwiki links[edit]

In E-6 process this bot deleted[[zh:E-6沖印處理]] (see http://en.wikipedia.org/w/index.php?title=E-6_process&action=historysubmit&diff=337224336&oldid=337222773) - (I restored)

- Leonard G. (talk) 18:51, 11 January 2010 (UTC)

AfD nomination of Seat configurations of the Airbus A380[edit]

Ambox warning pn.svg

An editor has nominated one or more articles which you have created or worked on, for deletion. The nominated article is Seat configurations of the Airbus A380. We appreciate your contributions, but the nominator doesn't believe that the article satisfies Wikipedia's criteria for inclusion and has explained why in his/her nomination (see also Wikipedia:Notability and "What Wikipedia is not").

Your opinions on whether the article meets inclusion criteria and what should be done with the article are welcome; please participate in the discussion(s) by adding your comments to Wikipedia:Articles for deletion/Seat configurations of the Airbus A380. Please be sure to sign your comments with four tildes (~~~~).

You may also edit the article during the discussion to improve it but should not remove the articles for deletion template from the top of the article; such removal will not end the deletion debate.

refLinks[edit]

Could your bot work with reflinks like:

<ref>[http://www.google.com http://www.google.com]</ref>
<ref>[http://www.google.com http://www.google.com] {{lang|en}}</ref>

Thanks. Malarz pl (talk) 06:20, 4 April 2010 (UTC)

or to enter the lang code for refs in foreign languages (but that might be hard to spot the language, googl's translator does a decent job of it but i dont know the code)Lihaas (talk) 22:39, 20 November 2010 (UTC)

Libertapedia[edit]

Just letting you know, I'll be using reflinks over at http://libertapedia.org/ . Thanks for developing that script, that should save a lot of work. By the way, do you need/want access to JSTOR? It could be arranged. Peace out, Tisane (talk) 21:18, 8 April 2010 (UTC)

User:Dispenser/Reflinks[edit]

I am taking you up on your offer, I would like to try your script [10]. Mlpearc powwow 21:23, 30 August 2010 (UTC)

Odebranie flagi bota na pl.wiki/Bot flag removal@pl.wiki =[edit]

according to pl.wiki policy pl:Wikipedia:BOT#Bot_flag_removal your botflag has been removed due to inactivity during last 12 months. If you still want to use a bot on pl.wiki please reapply here: pl:Wikipedia:Boty/Zgłoszenia Masti (talk) 21:58, 6 September 2010 (UTC)

W związku z brakiem aktywności przez ostatnie 12 miesięcy zgodnie z regulaminem flaga bota została odebrana. Jeśli nadal chcesz z niej korzystać możesz ponownie zgłosić prośbe o uprawnienia tu: Wikipedia:Boty/Zgłoszenia Masti (talk) 21:58, 6 September 2010 (UTC)

Cite web or cite web[edit]

Hi, could you adjust the reflinks that it would use the Capital version of Cite web which seems to be the right style? Now there is some bots and reflinks doing it different way rgds --Typ932 T·C 17:02, 30 September 2010 (UTC)

Your looking for User talk:Dispenser/Reflinks, since I've been maintaining the Toolserver version. — Dispenser 19:45, 30 September 2010 (UTC)
Okay thx --Typ932 T·C 19:57, 30 September 2010 (UTC)

Alexander McQueen[edit]

Can you please kindly run the bot to fix the Alexander McQueen (brand) page's 100+ refernece links? I've attempted to do so with the bot but was unsucessful. Thank you.Reqluce (talk) 18:56, 4 November 2010 (UTC)

reflinks[edit]

i managed to get the automatic version working, but i was wondering if you could get the bot to remove anything before a colon (:)? In some sourced the name of the source shows up before the title, which distracts from the source itself. In Al Jazeera sources, for example, the name comes up after which is good, but otherswise we have something like "PressTV : ..." which reflect that as being part of the title.Lihaas (talk) 22:35, 20 November 2010 (UTC)

Love this option[edit]

Have one concern ....have been using this for years and recently when i use it i get logged out of /secure.wikimedia.org, anything we can do to help this.Moxy (talk) 16:16, 8 December 2010 (UTC)

NicDumZ is not much involved with relfinks anymore. However, as I suspect that you're coming from my Toolserver page, you just need to go to Preferences page and check "Rewrite forms to submit to the secure server". — Dispenser 03:34, 9 December 2010 (UTC)
NicDumZ is not much involved anymore would kind of be enough! Thanks for watching my talk page for me :) NicDumZ ~ 07:01, 13 December 2010 (UTC)
Well I redesigning the Toolserver landing page for the Signpost article. Since then there's been a spike of people to your talk page. I've taken steps that hopefully reduce the confusing. Have a happy new year. — Dispenser 01:40, 27 December 2010 (UTC)

reflinks[edit]

can the basic version be programmed to add a "dead" link tag? i cant figure otu hwo to use the manual version.(Lihaas (talk) 18:50, 30 December 2010 (UTC)).

Checklinks incorporates both dead link-tagging and reference-combining/naming functionalities. Logan Talk Contributions 07:13, 13 March 2011 (UTC)


cite tag, again[edit]

I've read the discussion on why the bot does not use the Cite tag and tend to agree with the others in the thread that point to it promoting good citation habits and laying the groundwork for further expansion. I also understand that this is your bot and it's your call on how it is implemented.

With all that in mind, would you be willing to share your code for a spinoff version that does use the cite tag?--RadioFan (talk) 13:53, 19 June 2011 (UTC)

Esperanto[edit]

Hi, I'd like to run this script on Esperanto Wikipedia. What should I add in order to run it with out template and all the message translated?

By now:

stopPage = {
            'da':u'Bruger:DumZiBoT/EditThisPageToStopMe',
            'de':u'Benutzer:DumZiBoT/EditThisPageToStopMe',
            'en':u'User:DumZiBoT/EditThisPageToStopMe',
            'eo':u'Uzanto:Airon90/RefHaltigo',
            'fa':u'کاربر:Amirobot/EditThisPageToStopMe',
            'fr':u'Utilisateur:DumZiBoT/EditezCettePagePourMeStopper',
            'it':u'Utente:Marco27Bot/EditThisPageToStopMe',
            'hu':'User:Damibot/EditThisPageToStopMe',
            'ko':u'사용자:GrassnBreadRefBot/EditThisPageToStopMe1',
            'pl':u'Wikipedysta:MastiBot/EditThisPageToStopMe',
            'zh':u'User:Sz-iwbot',
}
msg = { 
        'da':u'Bot: Tilføjer beskrivelse til eksterne links, se [[:en:User:DumZiBoT/refLinks|FAQ]]',
        'de':u'Bot: Korrektes Referenzformat (siehe [[:en:User:DumZiBoT/refLinks]])',
        'en':u'Robot: Converting bare references, using ref names to avoid duplicates, see [[User:DumZiBoT/refLinks|FAQ]]',
        'eo':u'[[VP:R|Roboto]]: Aranĝo de notoj kun eksteraj ligiloj sen titolo, vidu [[w:Uzanto:Airon90/refLinks|dokumentadon]]',
        'es':u'Formateando las referencias que no tuvieran títulos (FAQ : [[:en:User:DumZiBoT/refLinks]] )',
        'fa':u'ربات:تصحیح پيوند به بيرون يا عنوان پيوند. [[:en:User:DumZiBoT/refLinks|اطلاعات بیشتر]]',
        'fr':u'Bot: Correction des refs. mal formatées, suppression doublons en utilisant des références nommées (cf. [[Utilisateur:DumZiBoT/liensRefs|explications]])',
        'hu':u'Robot: Forráshivatkozások kibővítése a hivatkozott oldal címével',
        'it':u'Bot: Sistemo note con collegamenti esterni senza titolo ([[Utente:Marco27Bot/refLinks.py|documentazione]])',
        'ko':u'봇: url만 있는 주석을 보강, (영문)[[:en:User:DumZiBoT/refLinks]] 참조',
        'pl':u'Bot: Dodanie tytułów do linków w przypisach (patrz [[Wikipedysta:MastiBot/refLinks|FAQ]])',
        'ru':u'Bot: добавление заголовков в сноски; исправление двойных сносок',
}
deadLinkTag = {
               'da':u'[%s] {{dødt link}}',
               'de':u'',
               'en':u'[%s] {{dead link}}',
               'eo':u'{{404|%s}}',
               'es':u'{{enlace roto2|%s}}',
               'fa':u'[%s] {{پیوند مرده}}',
               'fr':u'[%s] {{lien mort}}',
               'hu':u'[%s] {{halott link}}',
               'it':u'{{Collegamento interrotto|%s}}',
               'ko':u'[%s] {{죽은 바깥 ê³ ë¦¬}}',
               'pl':u'[%s] {{Martwy link}}',
               }
comment = {
           'ar':u'عنوان مولد بالبوت',
           'da':u'Bot genereret titel',
           'de':u'Automatisch generierter titel',
           'en':u'Bot generated title',
           'eo':u'Titolo kreata per roboto',
           'es':u'Título generado por un bot',
           'fa':u'عنوان تصحیح شده توسط ربات',
           'fr':u'Titre généré automatiquement',
           'hu':u'Robot generálta cím',
           'it':u'Titolo generato automaticamente',
           'ko':u'봇이 따온 ì œëª©',
           'pl':u'Tytuł wygenerowany przez bota',
           'ru':u'Заголовок добавлен ботом',
           }
badtitles = { 
              'fr': '.*(404|page|site).*en +travaux.*',
              'en': '',
              'es': '.*sitio.*no +disponible.*',
              'it': '((pagina|sito) (non trovata|inesistente)|accedi)',
              'ru': u'.*(Страница|страница).*(не[ ]*найдена|осутствует).*',
            }
autogen = { 
            'da': 'autogeneret',
            'en': 'autogenerated',
            'eo': 'aŭtomate kreita',
            'it': 'autogenerato',
            'pl': 'autonazwa',
            }

I also reordered the items by name :) Please contact me on the Italian Wikipedia --→ Airon 17:32, 24 July 2011 (UTC)

Peacockish edit[edit]

Your bot made an edit to this article a while ago, changing a reference to a newspaper article and adding the comment "Daily Express: The Worlds Greatest Newspaper" . not only is this laughably untrue, it is also offends our guidelines on advertising. Its annoying enough when editors put puff items in for products; having a bot running around doing it is bad news. The bot has over 200, 000 edits to trawl through; is it likely to have made a habit of this? How do you propose the problem be fixed? Moonraker12 (talk) 14:14, 16 January 2012 (UTC)

Happens. This was a robot, code and use-case was approved by the community.
The idea was simple: fetch the HTML title of the page and use it. In this case, the Daily Express had a stupid title, and my robot copied it. Too bad ;)
Anyways, bot has stopped, I'm now retired, etc, etc. NicDumZ ~ 21:42, 8 November 2012 (UTC)

Question[edit]

Hi NicDumZ, can I put some sort of template on an article so that reflinks cannot be run on that particular page?

Thanks -- Marek.69 talk 04:19, 28 January 2012 (UTC)

None of my bots run reflinks anymore. Please check with other bot owners as I'm now retired. NicDumZ ~ 21:42, 8 November 2012 (UTC)