User talk:DASHBot/Dead Links

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Comments from JimWae[edit]

The bot is inserting archivedates into pages even when the url is still live: Ah Louis, Agnosticism. It also is not checking to see what dateformat is already used for accessdates. --JimWae (talk) 20:34, 23 March 2012 (UTC)

In regards to your first comment: The bot has a tough time checking the status of urls, just because some websites automatically serve a "Dead" response to the bot when the bot uses a different HTTP User-agent. Basically, when the bot requests the status of the URL, it tells the server that it is a bot. Some servers always return 404 to bots, to prevent automatic downloading of their content, even though the page is still very much alive. I decided that it would be best to skip the validating of urls altogether, and assume that they are all alive. The bot preemptively adds an archive to {{Citation}} templates so that it will show up in the reference. Example
  • Google.com, retrieved June 2 1970  Check date values in: |accessdate= (help)
turns to
  • Google.com, archived from the original on June 3 1970, retrieved June 2 1970  Check date values in: |archivedate=, |accessdate= (help)
Notice that the original url is still first, because the |deadurl= parameter is set to no.
In regards to your second comment: You're right. The bot should be trying to maintain the date format of the article. I actually had this idea a bit ago, but I seem to never have implemented it. It's now on my todo list.
Thanks for the comments! Tim1357 talk 23:16, 27 March 2012 (UTC)
Could you implement a whitelist of websites that are known to block bots and not add archive links to those? --Flex (talk/contribs) 13:19, 4 June 2012 (UTC)

Comments from Hoverfish[edit]

I am not sure how this is supposed to work, but probably the same just happened in Aguas Corrientes. There are two active links linking to DOC format and the bot added the archived version in the ref template with a deadurl= no parameter. It doesn't bother this article (though it may overload longer ones) but it is adding tons of needless KBs doing this in all similar links. Hoverfish Talk 20:54, 23 March 2012 (UTC)

Does my answer to Jim's comments above satisfy you? Tim1357 talk 23:21, 27 March 2012 (UTC)

Thank you Tim, the tech part goes a bit over my head. But yes I am aware that the original links remain intact. In practical terms: I follow only some main Geography of Uruguay articles where such links to doc files in this domain are in use and so far I only got the one notice I mentioned. If I notice a large number of such double links in articles where they add too much code in articles, I will be back. Cheers. Hoverfish Talk 16:21, 28 March 2012 (UTC)

Suggestion regarding date format for the archivedate parameter[edit]

Currently the bot seems to store the archivedate parameter in the dmy format, regardless of the date preferred in the article. This does not follow the WP:MOSDATE guidelines. Maybe the bot could be amended, so it scans for either {{use dmy dates}} or {{use mdy dates}} to determine the correct date format to be used and stored. If neither template is found, then the bot could store the parameter in the 20yy-mm-dd format, which, according to the same guidelines, is acceptable in both cases. HandsomeFella (talk) 18:43, 10 June 2012 (UTC)

When it does insert the dates in mdy format (e.g. this edit), could the bot include the comma between the day and year per WP:DATEFORMAT? Thanks! GoingBatty (talk) 16:29, 2 October 2012 (UTC)

Stopped[edit]

I've stopped the bot, as the West Midland Bird Club link in this edit (since reverted) was - and is - working, with no bogus response to bots (I'm the webmaster of that site). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:27, 15 November 2012 (UTC)

Pigsonthewing, note the parameter "deadurl= no". DASHBot never claimed your website was offline. – sgeureka tc 15:01, 16 November 2012 (UTC)

Internet Archive and HTTPS[edit]

When adding a Wayback Machine link to archiveurl=, could you please use https://, since the Internet Archive has recently turned on HTTPS support and encourages people to use it. Thanks. --bender235 (talk) 21:15, 10 November 2013 (UTC)