User talk:NicDumZ: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Question re identifying refs with blank parameters vs refs with the parameter name also missing
Line 511: Line 511:
:Well, if the content is different, you should use two different reference names, obviously. "imdb awards", and "imbd trivia". Using "imdb" for two different contents is confusing, and that's what the bot has flagged. :)
:Well, if the content is different, you should use two different reference names, obviously. "imdb awards", and "imbd trivia". Using "imdb" for two different contents is confusing, and that's what the bot has flagged. :)
:[[User:NicDumZ|NicDumZ]] <font color="red">[[User_talk:NicDumZ|~]]</font> 10:44, 9 August 2008 (UTC)
:[[User:NicDumZ|NicDumZ]] <font color="red">[[User_talk:NicDumZ|~]]</font> 10:44, 9 August 2008 (UTC)

== Blank fields in refs ==
Is there a way the bot can avoid reporting (or possibly even fix) apparent differences between refs where the only difference is one has a blank field (eg. issn=|) and the other does not even include the paramater? An example is [http://en.wikipedia.org/w/index.php?title=Talk:Sun_tanning&diff=230778337&oldid=230743939 here]. The refs are identical, other than the single blank field at the end of the second one. Content identical, mild format difference - similar to the issue raised above.

I appreciate the answer may be no, but the watchlist spam is getting substantial and little things like this might make a difference. [[User:Euryalus|Euryalus]] ([[User talk:Euryalus|talk]]) 10:57, 9 August 2008 (UTC)

Revision as of 10:57, 9 August 2008

Collateral efects of es:usuario:DumZiBoT

Please,stop the bot. For example: here the title is incorrect, your bot put Sitio no disponible en este momento. Intente más tarde ja ja. Other bad titles: es:Mozilla Firefox‎ ver aquí title very long:

Totalidea Software: Tweak Windows Vista - Windows Vista Tweaks - Vista Tweaks - TweakVI - Tweak-VI - Tweak-Vista - TweakVista - TweakXP - Tweak-XP - Tweak XP - Registry - Regedit - Windows Tuning - Windows XP - Windows Vista - Tweaking - Optimize - T...

In es:Machu Pichu, ver aquí, a http://www.waterhistory.org/histories/machupicchu/ ,put title WaterHistory.org, in this case not is convenient, is preferible www.waterhistory.org/histories/machupicchu/.

I think that can exist other cases , saludos es:usuario:Shooke , Shooke (talk) 18:21, 31 May 2008 (UTC)[reply]

I have to support the second point. It should also be able to identity that Nacionalista Party .com is same as nacionalistaparty.com (the domain). Suggested code is
if re.sub(r'[^A-Za-z\.\-]', r'', ref.title.lower()) in domain.match(link+redir).group():
    # Should improve url and redirect matching
    repl = ref.refLink()
    new_text = new_text.replace(match.group(), repl)
    wikipedia.output(u'\03{lightred}WARNING\03{default} %s : Title is URL component (%s)' % (ref.link, ref.title))
if self.titleBlackList.search(ref.title):

Dispenser 00:43, 2 June 2008 (UTC)[reply]

Your bot

Absolutely wonderful idea! A small improvement I'd suggest, though: I've read in your talk page archive that you don't want to use citation templates in general, and I understand your reasons for this. However, if an article already uses citation templates in some or most of its references, wouldn't it be more sensible to have your bot convert bare references to a simple citation template reference instead of the non-template format? —Nightstallion 11:34, 1 June 2008 (UTC)[reply]



Bot

Hey, can you run your bot through Roy Hibbert again? Thanks! Bash Kash (talk) 21:37, 1 June 2008 (UTC)[reply]

Zimbabwean dollar

Howdy! Please could you fix Zimbabwean dollar plz? 129.215.48.99 (talk) 01:00, 5 June 2008 (UTC)[reply]

You might want try using web reflinks which automatically converts numbered references into ref tags. — Dispenser 04:31, 12 June 2008 (UTC)[reply]

noreferences.py bug

# Is there an existing section where we can add the references tag?
for section in wikipedia.translate(self.site, referencesSections):
    sectionR = re.compile(r'\r\n=+ *%s *=+\r\n' % section)

[ ... ]

# Create a new section for the references tag
for section in wikipedia.translate(self.site, placeBeforeSections):
    # Find out where to place the new section
    sectionR = re.compile(r'\r\n(?P<ident>=+) *%s *=+\r\n' % section)

It should be

   sectionR = re.compile(r'\r\n=+ *%s *=+ *\r\n' % section)

and

   sectionR = re.compile(r'\r\n(?P<ident>=+) *%s *(?P=ident) *\r\n' % section)

since headers can trailing white space on the end. This might explain some of the weird things I've seen the script do.

Following up on the previous discussion I've tried to reduce the confusion in the documentation about the two scripts being the same. The official release of reflinks.py is running under the reflinks-svn and is almost identical to the svn version. The sources for the online tools are available at http://toolserver.org/~dispenser/resources/sources/. I've also integrated the script with link checker.

I would like to eventually merge the scripts in the future. Could there be an iterator function for generating links? I would like to match unlabeled bullet links or give people the option of converting to citation templates. The method that options are passed to the internals is "bulky" if 10 or more options are defined. Additionally, would it be possible with {{dead link}} to include the date parameter, I use non-language portable time.strftime("{{dead link|date=%B %Y}}") in the web script. I'm working out a way to fill in citation templates.

A passing thought would the pywikipedia community object to AWB like general fixes? — Dispenser 00:53, 8 June 2008 (UTC)[reply]

Sorry, sorry, sorry.
I'm very busy these days.
The regex change was a 10 second fix, I committed it on Sunday in r5541, but I sort of run out of time for longer concerns :/
Thanks for your involvement Dispenser, I *will* take a deeper look at your suggestions, but I just can't do it now.
NicDumZ ~ 20:53, 11 June 2008 (UTC)[reply]

Inadvertent advertising

The HTML title from webpages belonging to magazines, which might be perfectly appropriate links in articles, can make highly inappropriate references, because they are used to give promotional messages about the magazine. For example, there are few more comprehensive and well updated sources in English for following competitive cycling than Cycling Weekly, and so it was a good source for somebody to use to show that two teams had been offered a late entry into the 2008 Giro d'Italia, but it is not the place of Wikipedia to declare, as DumZiBoT did, this publication to be "Britain's biggest-selling cycling magazine, delivers an exciting mix of fitness advice, bike tests, product reviews, news and ride guides for every cyclist". Maybe the Bot needs a filter to change its action when it comes across boastful superlatives, or maybe the automatic editnote should more explicitly invite editors to check the suitability of the results. Just a thought. Kevin McE (talk) 11:28, 1 June 2008 (UTC)[reply]

Sorry for the archiving, true for the advertising.
An different edit summary would fit, yes, but boastful superlatives are not this easy to detect. In this case, only biggest-selling and exciting are problematic, and I don't think that I can blacklist a title because it contains two superlatives, can I ? Moreover, this check has to be language independent, and... no... it's not this easy :(
NicDumZ ~ 20:58, 11 June 2008 (UTC)[reply]

Overenthusiastic archiving

Since your last edit on this page, MiszaBot III has moved 10 items onto the archive page, thus leaving some pertinent issues abount this bot apparently unaddressed. This does not seem a very satisfactory response to the comments of other editors. Is it possible to disable the archiving bots trawls of this page while you are not active (only one edit in more than a week). Kevin McE (talk) 14:48, 8 June 2008 (UTC)[reply]

Yep, archiving disabled. Thanks for the report :/ NicDumZ ~ 20:44, 11 June 2008 (UTC)[reply]

Hola

Cuando pongo Ocultar ediciones de bots en mi lista de seguimiento, siguen apareciendo las de este bot. ¿Qué es lo que está fallando?. Gracias 189.162.18.70 (talk) 03:29, 9 June 2008 (UTC) eswiki[reply]

Bot - a minor point

Hi, and well done on your work with the bot - it a fantastic idea.

Just a tiny thing..

Would it be possible to make the bot not insert a title, when the title is "Untitled" or "Untitled Document"? It won't happen a lot, but if it is easy to do, it's probably worth it.

Don't waste your own time if it'll be a big job. Many thanks, Drum guy (talk) 15:40, 9 June 2008 (UTC)[reply]

I believe my bot should ignore these. If, recently, it inserted such a title into an article, please give me a diff of such an edit, so I can track a possible bug. NicDumZ ~ 20:45, 11 June 2008 (UTC)[reply]

HTML in title

The bot made an incorrect edit, adding HTML as a title. Most likely, this was because the HTML was incorrectly nested within the title tag in the original page. However, the bot should know to either ignore such titles, or strip out the HTML. Superm401 - Talk 20:34, 10 June 2008 (UTC)[reply]

Yes, I could do that.
However that edit occurred in an other of these times when DumZiBoT was broken.
This lenghty title, containing " the page you requested could not be found." is supposedly blacklisted. I tested it again, and now the page is being ignored.
NicDumZ ~ 20:49, 11 June 2008 (UTC)[reply]

References in templates

Your bot added <references/> to a transcluded template which it shouldn't do. I have wrapped <references/> in <noinclude>s there for now. (Just letting you know). – sgeureka tc 02:17, 13 June 2008 (UTC)[reply]

Thanks for the report.
I added a namespace check before adding <references/> :)
NicDumZ ~ 07:47, 29 June 2008 (UTC)[reply]

Love Bot

I hate the line noise that is required by Cite.php. I've hated on it elsewhere, and I don't intend to mellow with (r)age. Your bot, however, turns ordinary cites into line noise cites in a way that I can't object to. So now I hate you for coming up with something so utterly wonderful that I can't get my hate on.

In case that's not clear, I'd just like to say, for the record, that I completely and utterly hate you :-)

chocolateboy (talk) 16:51, 14 June 2008 (UTC)[reply]

Small problem

Your bot seems to have done something slightly weird here [1]. Maybe you need to strip out formatting from within the title. I have corrected the article page. Keep up the good work. 82.1.57.47 (talk) 05:39, 20 June 2008 (UTC)[reply]

How about duplicate refs?

This bot is doing awesome work, still. I get a sense of pride since I feel like I "commissioned" this work in the first place. ;-)

I was just editing a page that the bot had "fixed up", but I realized that the people who cited references didn't understand how to cite the same ref more than once. So the page had a bunch of...

<ref>[http://somewhere Somewhere<!--auto bot gen--></ref>
<ref>[http://somewhere Somewhere<!--auto bot gen--></ref>
<ref>[http://somewhere Somewhere<!--auto bot gen--></ref>

... instead of ...

<ref name="somewhere">[http://somewhere Somewhere<!--auto bot gen--></ref>
<ref name="somewhere" />
<ref name="somewhere" />

Is there a way for the bot to check for duplicates while it does its work? Could the bot check for duplicates on pages it has already done? I realize this isn't an easy task (what would the refname be, for example?) but I just thought I'd throw it out there.

Timneu22 (talk) 13:36, 22 June 2008 (UTC)[reply]

Yes, I've been asked to do this for a while :)
I wrote something to handle this (when no refname is specified in the article body, refname is "autogenerated#" where # is an id)
I'm testing it on fr:, where I wont get beaten this much if I make some small mistakes :)
I will then apply it to en :)
NicDumZ ~ 10:20, 29 June 2008 (UTC)[reply]

Combines references that aren't the same and orphans references

Something isn't quite right about how it detects/combines duplicate references. See this edit [2]. Specifically the part where it changed: <ref name="update1">IUDs—An Update. [http://www.infoforhealth.org/pr/b6/b6chap1.shtml#top Chapter 1: Background].</ref> into <ref name="population" />

This was wrong in that name=population refers to a different chapter in the referenced item, it also broke other places where name="update1" was referred to.

  • If it is going to rename a ref, it needs to rename all instances, not just the defining instance.
  • Something is wrong in how it decides that two references are the same, the ref with name=population has a different (although similar) URL and a different displayed name, so someplace it made a mistake by deciding to combine them.

(The other combination on the page worked okay.) I repaired the page. Hope this helps improve the bot. Thanks. Zodon (talk) 19:02, 22 July 2008 (UTC)[reply]

Hello !
First of all, thanks for the kind (calm) report. These really help to improve the bot, indeed.
However, I think that the page was buggy before my bot :) See this, the ref name "population" referred to two different references, hence the strange behavior.
You are right, however, about your first point (not renaming all the instances), this has been fixed in my script, per this
Thanks a lot, Zodon, this was a very useful report.
NicDumZ ~ 10:27, 23 July 2008 (UTC)[reply]
Glad it helped. I missed the bit about there being two items with name=population. Assume you also augmented the bot so that it will detect when there are multiple different references with the same name, and fix the naming issue if it can (e.g. 2 definitions of name with no other uses of it), or flag it if it can't. (I imagine this isn't the only page with that error.) Thanks. Zodon (talk) 05:11, 24 July 2008 (UTC)[reply]
True, it was a bit tricky, but I just implemented it : the seconde duplicate "population" gets changed into autogenerated1 :)
Thanks again !
NicDumZ ~ 09:54, 24 July 2008 (UTC)[reply]

A new named extlinks bot

Greetings. My name's Quadell, and I run Polbot. I'm considering creating a bot to perform various improvements in external links and references, and I'm a fan of DumZiBoT's task of naming links. I was wondering if you'd be willing to share your code, or at least the regexps you use, for DumZiBoT. That way I'm only partially recreating the wheel. (I'll be working on specific categories in real time, rather than using a database dump, by the way.) Thanks so much! – Quadell (talk) (random) 00:20, 29 June 2008 (UTC)[reply]

Sure ! The code is available in the pywikipedia repository, under GPL. Let me know how successful you are in your quest :p
NicDumZ ~ 07:38, 29 June 2008 (UTC)[reply]
Thanks for the link! Very helpful indeed. But I can't see where, or how, you merge duplicate refs. At User:DumZiBoT/refLinks, you say "When duplicate references are found (i.e. references having the exact same content) only the first is kept, and a refname is added to the others ( example )" That's the last part I can't figure out how to do, without making bad errors. Where is the code for this function? By the way, I'm requesting approval at Wikipedia:Bots/Requests for approval/Polbot 8. Thanks again, – Quadell (talk) (random) 01:39, 7 July 2008 (UTC)[reply]
That's because it has not been committed yet, for it's untested. (see the above section :) )
From the few pages I modified, it seems to work, but I'd like to test it on the next French dump, and receive feedback on it, before committing it.
NicDumZ ~ 08:03, 7 July 2008 (UTC)[reply]

Loading several pages at once with the wikipediabot

Hi, I'm in the process of creating a script that checks for a message that is supposed to be above the interwiki in every article on the nn wikipedia. Now, what I don't get is how to load several pages at once. The getall function in wikipedia.py doesn't return anything; so I don't know how to use it, if I in fact should. Thanks in advance. --Harald Khan Ճ 19:40, 6 July 2008 (UTC)[reply]

I apologize if I asked the wrong guy ;-) --Harald Khan Ճ 17:30, 31 July 2008 (UTC)[reply]

Changing refs in HTML comments

Re this edit. There is no need to go and edit ref tags when it is in a HTML comment. It adds nothing of value to the article. Mikemill (talk) 20:28, 21 July 2008 (UTC)[reply]

Sorry for the disturbance :)
This case is a bit particular, but it usually does nothing bad to actually format the comments inside html comments, while checking for a wrapping HTML comment can become really, really complicated.
I just upgraded my script to ignore blank (whitespaces only) references.
Thanks for your report ;)
NicDumZ ~ 20:36, 21 July 2008 (UTC)[reply]
Thanks for the change. IIRC HTML comments are not allowed to be nested inside of each other so it should be a matter of looking for the <!-- token and the --> token and removing the stuff inbetween. However, I'm not sure what format you are looking at the page code in so if you believe it is too hard then so be it ;) Mikemill (talk) 13:20, 22 July 2008 (UTC)[reply]
Sure, nested comments are not allowed. However, I work with regular expressions to detect different type of references, and the thing is : I can't "exclude" from these matches the commented wikitext. I can remove all the commented text before doing any work, but then I would have to re-add it after processing the references, and that might be tricky. Also, when working on a specific part of the text, testing if it is nested in an HTML comment is easy, and I could test, one by one every references and exclude the ones included in comments. However, the last part of my script is basically : "for each found references, replace it by the processed one". Now imagine two identical references, the first being inside a comment, the latter not being commented out. If I want to ignore the first one but not the latter one, it gets really hard, while processing all the identical references the same way is really easy.
There are ways, of course, to take special care of these cases, but honestly, I dont think that the very rare cases where it causes problems justify spending so much time on code :)
NicDumZ ~ 13:31, 22 July 2008 (UTC)[reply]

Hey

In the word "dumb" the letter 'b' is silent. Cheers. 89.243.32.74 (talk) 12:04, 22 July 2008 (UTC)[reply]


Messed up sources in Hamas article

Hello. On the last run of the bot on the Hamas article all references were messed up from ref. no 37 and onwards. Can you please recommend a solution? Thanks. Tkalisky (talk) 04:11, 23 July 2008 (UTC)[reply]

Sorry... I just fixed this here !
I have fixed my bot, thanks for the report :)
NicDumZ ~ 10:42, 23 July 2008 (UTC)[reply]

Edit summary?

Hey NicDumZ. I noticed DumZiBoT is removing duplicate citations, like (s)he did at Coronary artery bypass surgery here. It took me a second, because the edit summary (s)he left was "(Bot: Converting bare references, see FAQ)". Otherwise, on a cursory glance, it looked like s(he) had just deleted two references... After a second, I realized they were duplicates, when I noticed that DumZiBoT left the "ref name" in place. I see that's mentioned briefly under the "features" list in the FAQ... Is it possible for DumZiBoT's edit summary to say something like "Bot: References cleanup. Converting bare refs, removing and adding ref name for duplicates, adding missing ref lists; see FAQ." I guess that's kind of long, but it's just that s(he) isn't just converting bare references, and that can cause confusion (at least for me?)...   user:j    (aka justen)   15:45, 23 July 2008 (UTC)[reply]

Thanks for this comment. The summary was of course the old summary, from the first version of my script.
I just changed it, hoping that it's better.
There was a huge bug, preventing any new titles from being added, it has also just been fixed.
Again, thanks for this one, it has been very useful.
NicDumZ ~ 16:15, 23 July 2008 (UTC)[reply]

Bot error

The bot did bizarre things at Mongol Empire, changing valid ref names to "autogenerated" and messing up some other tags,[3] so I went ahead and reverted. This was a few hours ago, so I don't know if there might be other damaged articles, but I would recommend checking. --Elonka 18:02, 23 July 2008 (UTC)[reply]

True, this is buggy. The bot has stopped since, so no need to worry about it, but I will make sure to fix that one before restarting it, once back home.
Thanks for the accurate report ;)
NicDumZ ~ 19:11, 23 July 2008 (UTC)[reply]
Eventually, I fixed it !
Thanks again :)
NicDumZ ~ 09:07, 24 July 2008 (UTC)[reply]

Bot messed up Warren Buffett page

Categories were not showing up after bot's visit. I reverted the changes. —Preceding unsigned comment added by Kaka2008 (talkcontribs) 18:34, 23 July 2008 (UTC)[reply]

I believe this had nothing to do with mw bot, I reverted back :)
NicDumZ ~ 18:57, 23 July 2008 (UTC)[reply]

Dup refs

It looks like DumZiBot is fixing duplicate refs and doing a great job at it. Huzzah! – Quadell (talk) 19:10, 23 July 2008 (UTC)[reply]

Well, this particular edit is broken :P ( see above ) There's no need to replace existing reference names into "autogenerated". It has something to do with quotes. The script works with name="blah" but replaces name=blah by name="autogenerated#". Minor mistake, yes, but easy fix; I'll look into this tomorrow.
But still, yay ! :)
NicDumZ ~ 19:26, 23 July 2008 (UTC)[reply]

Bot is substituting bad names for working references

Please see this diff: [[4]]

I believe the bot is doing this because it does not see "quotes" around the name. Quotes are not necessary for single-word names in references. As in

<ref name=WORD>

"Quotes" are only needed if there is more than one word. As in

<ref name="WORD1 WORD2"> --Timeshifter (talk) 22:28, 23 July 2008 (UTC)[reply]
Yes, you are right, even if it has been reported to me above :p
I just fixed it, sorry for the inconvenience. My next move will be to modify my script, not to add quotes when those weren't there, to avoid what's happening in that last diff :)
NicDumZ ~ 09:21, 24 July 2008 (UTC)[reply]

Bot is combining 2 different references with the same name

Casualties of the Iraq War. Please see this diff: [5]

The bot combined 2 different references into one. If one looks at the page at the above diff link and searches for

<ref name=LAtimes>

one sees that there were 2 references mistakenly using the same name. The easy way for the bot to tell might be to compare the URLs. The bot gave the 2 references the same name. One of the references was then no longer used.

Maybe the bot could give one of the references a different name. Casualties of the Iraq War may be a good beta workout for the bot since it has over 150 references. --Timeshifter (talk) 13:36, 24 July 2008 (UTC)[reply]

And if you look a little bit closer, you'll see that the first reference using LAtimes ""War's Iraqi Death Toll Tops 50,000". Louise Roug and Doug Smith. Los Angeles Times. June 25, 2006." is kept, while the second, ""Poll: Civilian Death Toll in Iraq May Top 1 Million". By Tina Susman. Sept. 14, 2007. Los Angeles Times." is converted into "ORB2" :)
This matter had been raised earlier in the day, and has been fixed since.
NicDumZ ~ 13:46, 24 July 2008 (UTC)[reply]

(unindent) OK, I figured it out, I think. The bot tried to separate the 2 references and saw that one of the references had been given the name "ORB2" elsewhere (probably by me long ago).

But the bot got mixed up sometimes and put the "ORB2" reference where the "LAtimes" reference was.

It is easiest to see by looking for this line in the article:

A June 25, 2006 ''[[Los Angeles Times]]'' article, "War's Iraqi Death Toll Tops 50,000",<ref name=LAtimes/>

The bot mistakenly substituted the "ORB2" reference at the end of that line. I don't know how the bot could have known what reference name to use there though.

The bot made a mistake at another location too. Look for

A June 25, 2006 ''[[Los Angeles Times]]'' article<ref> [http://www.commondreams.org/headlines06/0625-03.htm "War's Iraqi Death Toll Tops 50,000"]. Louise Roug and Doug Smith. ''[[Los Angeles Times]].'' June 25, 2006.</ref>

The bot substituted the "ORB2" reference for that reference. The bot maybe should have looked at the URL and gave it the "LAtimes" reference name instead.

Maybe when there are 2 references using the same name, the bot should be instructed not to do anything unless it can also see the URL. Otherwise the bot is guessing. Since the bot can't think it can't do some tasks. Or the bot should put BOTH names. At least that way the readers might be able to figure out which one is the reference.

Better yet, the bot should flag the mistake somehow, so that the article editors can fix it. It would be nice if the bot could leave a note on the talk page. Edit history comments might not get seen by most editors if a later edit occurs soon after the bot edit. --Timeshifter (talk) 15:36, 24 July 2008 (UTC)[reply]

Okay, here is a fixed reference fix on that same article, copied in my fr: userspace, and here is the message added on the talk page of this article.
Better, isnt it ?
NicDumZ ~ 18:03, 24 July 2008 (UTC)[reply]
That is great! I don't see the PDF URLs though on the talk page notice:
I see the PDF URLs in the diff above the talk page notice, though. --Timeshifter (talk) 19:38, 24 July 2008 (UTC)[reply]
Because it uses Tl:PDFlink which probably does not the same as on en: :p If not, please explain me again what you're saying :)
NicDumZ ~ 19:41, 24 July 2008 (UTC)[reply]
That link goes to a redirect page with a link. Clicking the link goes to here:
http://fr.wikipedia.org/wiki/Mod%C3%A8le:Pdf
What is it supposed to do exactly? Hide the PDF? I can't figure out what it is. --Timeshifter (talk) 02:39, 25 July 2008 (UTC)[reply]
We dont use it the same way. Pdf does not take any parameter on fr, it is just used to "flag" pdf documents, as Template:Fr flags French documents. Anyway we don't care, do we ? What's important is the wiki text; if I copy it back to en:, it will work ? :)

(unindent). OK. Here is the same wikicode here on an English talk page:

Great! It is working fine here. --Timeshifter (talk) 15:23, 25 July 2008 (UTC)[reply]

Yup, I have opened a BRFA for that particular new task :)
NicDumZ ~ 15:25, 25 July 2008 (UTC)[reply]

I think your bot may be misbehaving :-)

Hi, four pages that I watch (because they all relate to Buckinghamshire) have just had their bare refs given an automatic title by your bot, which is fine, except that the name given to them was "Check Browser Settings" ([7] [8] [9] [10]). The external link in all cases was The Office of National Statistics, which I have been able to get into without problem, and have amended the refs accordingly. I recognise it may be a problem with the website itself as the bot doesn't appear to have done the same for other refs, however I'm wondering which other articles it has done this to? -- roleplayer 14:32, 24 July 2008 (UTC)[reply]

Bot Philosophy 101

This bot is fantastic when it is working right. It is a very needed bot. Many people just leave URLs as references without filling out the reference details.

I believe though that references are important above almost all else. So I suggest that the bot philosophy should be to err on the side of leaving incomplete references in the articles if the alternative is that the bot in effect removes some references in its efforts to fill in the details on all the references.

In other words when in doubt leave the reference in question as it is. For example; when one name is used for 2 references as described in a previous talk section.

<ref name="WORD or PHRASE">

This way the bot is almost always helping, and never hurting. Otherwise it may be creating many errors that may never get noticed. It already has created such errors. There is almost no way to know how many. I hope you keep bot runs short in order to allow people to comment, and to allow time to fix both the bot and the page errors. --Timeshifter (talk) 15:48, 24 July 2008 (UTC)[reply]

My bot only runs when told to, and when running, I'm constantly watching my talk page. It is stopped, and I'm working on an upgrade to leave dubious references alone, and to notice the editors in the talk page. dont worry ;)
NicDumZ ~ 15:54, 24 July 2008 (UTC)[reply]

Removing ref content

[11] In the diff, the bot removed the content for the named ref. Perhaps it was confused by the other uses of the name, some of which come before the definition in the text? Gimmetrow 14:43, 25 July 2008 (UTC)[reply]

Ah, this was a strange one. If you look at the wiki text after the diff, there IS a definition of the ref name WWEBio before the first use of <ref name="WWEBio" />, so I could not understand why the references were broken.
But... the parameter "billed height" did not exist :) so I replaced it to "height", and everything is repaired :)
NicDumZ ~ 14:59, 25 July 2008 (UTC)[reply]

Title regex bug

Output from http://www.tldp.org/LDP/sag/html/filesystems.html

<HTML
><HEAD
><TITLE
>Filesystems</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK

I have also seen

<head><title id="pagetitle">Title of page</title>

I have redone the method used to match titles in my version. I have a function which look for tags (title, h1, h2, h3, h4). If the first one is not found it then tried to use the second one till h2 before giving up.

Also, if you could implement transform function as a string cleanup function so I could run the function to clean the extra data (like authors, publisher, etc.). Thanks. — Dispenser 20:24, 25 July 2008 (UTC)[reply]

Reflinks PDF on Windows

Hey NicDumZ, I've been thinking about handling PDF files on Windows with your reflinks bot, and I was wondering if you could use the pdftotext and pdfinfo programs in the same way as you use the other program on the Unix versions (the same programs Zotero uses on Windows). Regards, --Dami (talk) 12:13, 30 July 2008 (UTC)[reply]

Your bot is whitening articles

Se here, for example. Or here. Patricio.lorente (talk) 21:42, 30 July 2008 (UTC)[reply]

Minor bot issue

First off, awsome bot. Second, there seems to be a small error in the code for when it posts on the talk page- see Talk:Chrono_Trigger. The bot made a link to "Chrono Trigger&redirect=no&oldid=228589053 the last revision I edited", which, since there is a space between Chrono and trigger, made a link that pointed to "Chrono" and had the title of "Trigger&redirect=no&oldid=228589053 the last revision I edited". Just letting you know! --PresN (talk) 22:46, 2 August 2008 (UTC)[reply]

Thanks for the kind report :)
I already saw the error, and already corrected it yesterday. I checked all my edits and corrected the spaces into underscores manually, I must have missed that one ;)
NicDumZ ~ 04:22, 3 August 2008 (UTC)[reply]

Your recent bot approvals request has been approved. Please see the request page for details. When the bot flag is set it will show up in this log. All the best, – Quadell (talk) 13:42, 6 August 2008 (UTC)[reply]

Bot report : Found duplicate references !

In the last revision I edited, I found duplicate named references, i.e. references sharing the same name, but not having the same content. Please check them, as I am not able to fix them automatically :)

  • "ancestor" :
    • {{cite book |last=Dawkins |first=Richard |authorlink=Richard Dawkins |title=The Ancestor's Tale |year=2004 |publisher=Houghton Mifflin |chapter=Chimpanzees}}
    • {{cite book |last=Dawkins |first=Richard |authorlink=Richard Dawkins |title=The Ancestor's Tale |year=2004 |publisher=Houghton Mifflin |chapter=Chimpanzees }}

DumZiBoT (talk) 09:52, 8 August 2008 (UTC)[reply]

This issue has been addressed. − Twas Now ( talkcontribse-mail ) 11:08, 8 August 2008 (UTC)[reply]
Looks like it was the same content, with a (marginally) different format. I think the 'bot needs to be upgraded. - UtherSRG (talk) 11:53, 8 August 2008 (UTC)[reply]
Is it possible to tweak the bot to recognise when the only difference between two references is the accessdate format of the citation, not the content itself(for example here)? Euryalus (talk) 04:17, 9 August 2008 (UTC)[reply]

"Check Browser Settings"

In a recent edit the hard-working DumZBot generated the automatic title "Check Browser Settings". I think perhps it should check for this and not use it. Otherwise, keep up the good work. David Underdown (talk) 13:04, 8 August 2008 (UTC)[reply]

Thanks for the report ;)
I just added that title to the blacklist, and restarted the bot.
NicDumZ ~ 13:07, 8 August 2008 (UTC)[reply]

Home Page

Surely changing http://www.sheringhammuseum.co.uk/ to Home Page as was done with Sheringham is not good practice? Your BOT should ignore changes to Home Page. --palmiped |  Talk  15:57, 8 August 2008 (UTC)[reply]

Bot now working perfectly

Casualties of the Iraq War. The last bot diff: [12] and talk page report:

The bot cleaned up some duplicate references, and added some titles to references that were missing titles. In its bot report the bot left the duplicate reference links using the same URL, but having differences in the reference description. I took care of those duplicates manually. Thanks, Super Bot Man! --Timeshifter (talk) 04:46, 9 August 2008 (UTC)[reply]

Heh... I spent like 3 minutes trying to figure out what was the problem, but meh... There's no problem :p
Fine, then =]
NicDumZ ~ 04:48, 9 August 2008 (UTC)[reply]

Thanks

Thank you for tidying up the Will Young page. I can find the references but don't know how to get them like you did. Oyster24 (talk) 05:49, 9 August 2008 (UTC)[reply]

Duplicate references: IMDb

Can you teach this bot about the IMDb any better? I see you've flagged some that have links to different subsections of the IMDb page about the same title, like the Trivia page and the Awards page. I think this will be done quite often and they will all quite often be labelled as just "IMDb". The main page for any title takes the form http://www.imdb.com/title/tt### (and for a person it takes the form http://www.imdb.com/title/nm###) but they can have various subsections after that in the URL like /trivia, /quotes, /plotsummary etc. There are also the various URLs that access the same system or shadow systems. The host name could be www.imdb.com, or just imdb.com, or us.imdb.com, or uk.imdb.com, or various others all listed on the page about the IMDb -- SteveCrook (talk) 10:13, 9 August 2008 (UTC)[reply]

Well, if the content is different, you should use two different reference names, obviously. "imdb awards", and "imbd trivia". Using "imdb" for two different contents is confusing, and that's what the bot has flagged. :)
NicDumZ ~ 10:44, 9 August 2008 (UTC)[reply]

Blank fields in refs

Is there a way the bot can avoid reporting (or possibly even fix) apparent differences between refs where the only difference is one has a blank field (eg. issn=|) and the other does not even include the paramater? An example is here. The refs are identical, other than the single blank field at the end of the second one. Content identical, mild format difference - similar to the issue raised above.

I appreciate the answer may be no, but the watchlist spam is getting substantial and little things like this might make a difference. Euryalus (talk) 10:57, 9 August 2008 (UTC)[reply]