Wikipedia:Bots/Requests for approval/WebCiteBOT: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 04:59, 24 January 2009

WebCiteBOT

Operator: ThaddeusB (talk)

Automatic or Manually Assisted: Automatic, with only occasional supervision

Programming Language(s): Perl

Function Summary: Combat link rot by automatically submitting newly added URLs to WebCite.org

Edit period(s): Continuous

Already has a bot flag (Y/N): N

Function Details: This BOT has not actually been coded yet. I am submitting this proposal to test for consensus and iron out the exact details of the bot's operation. I have posted notices in several locations to encourage community participation in evaluating this request.

Background: Linkrot is a major problem on all major websites, and Wikipedia is no exception. WebCite is a free service that archives web pages on demand. Unlike archive.org one can submit a website and it will be instantly archived. One is instantly given a permanent link to a copy of the cited content. It is intended for use by scholarly pages (like Wikipedia) that cite other online resources. I am proposing a bot (coded by me) that will automatically submit URLs recently added to Wikipedia to WebCite and then supplement the Wikipedia link as follows:

Original:

*Last, First. [http://originalURL.com "Article Title"], ''Source'', January 22, 2009.

Last, First. "Article Title", Source, October 27, 2005.

Modified:

*Last, First. [http://WebCiteURL.com "Article Title"], ''Source'', January 22, 2009. [[WebCite]]d on January 23, 2009; Originally found at: http://originalURL.com

Last, First. "Article Title", Source, January 22, 2009. WebCited on January 23, 2009; Originally found at: http://originalURL.com

Operation: WebCiteBOT will monitor the URL addition feed at IRC channel #wikipedia-en-spam. It will note the time of each addition, but take no immediate action. After 24 hours have passed it will go back and check the article to make sure the new link is still in place and that the link is used as a reference (i.e. not as an external link). These precautions will be in place to help prevent the archiving of spam/unneeded URLs.

After 24 hours have passed and the link has been shown to be used as a reference, the bot will take the following actions:

Check to make sure the URL is functional as supplied. If it's not, it will mark the link with {{dead link}} and notify the person who added it, in the hopes that they will correct it.
Make sure the URL is not from a domain that has opted out of WebCite's archiving - the list of such domains will be built as the bot finds domains so tagged
Submit the URL to WebCite.
Check if the submission was successful. If not, figure out why and update the bot's logs as needed.
If all was successful, update the Wikipedia article. This will be accomplished though a new parameter to {{web cite}}, and possibly a few other templates, titled "WCarchive". If the original reference was a bare reference (i.e. <ref>[http://ref.com title]</ref>) the bot will convert it to a web cite, filling in whatever fields it can and leaving a hidden comment to say it filled in by a bot.

Discussion

Policy Comments:

(I.e. Is this a good idea?)

As per my comments at the bot request, I believe the bot should not supplement the main url. I suggest a system where {{citeweb}} is modified in such a way that the bot fills the normal archive parameters, but also adds a parameter like |archive=no, which stops the archive url becoming the primary link in the citation. This parameter can be removed when the normal URL is confirmed to be 404'd and then the archive url becomes the primary link.

I suggest something like what is show below when the |archive parameter is set to no:

"My favorite things part II". Encyclopedia of things. Retrieved on July 6, 2005. Site mirrored here automatically at July 8, 2008

and then the normal archiving look when |archive is not set. Foxy Loxy ^Pounce! 04:20, 24 January 2009 (UTC)[reply]

I think it's a great idea. Who knows how many dead links we have. Probably millions. This should solve that, mostly. I just wonder if the real URL should show up first in the reference. - Peregrine Fisher (talk) (contribs) 04:20, 24 January 2009 (UTC)[reply]

Technical Comments:

(I.e. Will the bot work correctly?)

On the technical side of things, ~~I don't believe you have provided a link to source code~~ since programming has not commenced yet, I can't evaluate if it will work correctly. Foxy Loxy ^Pounce! 04:20, 24 January 2009 (UTC)[reply]

After I feel consensus is forming, I'll start the actual programming and post the code at that time. --ThaddeusB (talk) 04:59, 24 January 2009 (UTC)[reply]

I understand that WebCite honors noarchive instructions contained on web sites, and there are opt-out mechanisms. Is one of the requirements for the bot that it will deal gracefully with a refusal by WebCite to archive a page? --Gerry Ashton (talk) 04:43, 24 January 2009 (UTC)[reply]

Yes, if a page isn't archived (for whatever reason), no Wikipedia editing will be done. If the site has opted out, it will be added to a list of opted out sites that the bot will pre-screen to avoid doing unnecessary work (submitting URLs that are sure to fail). --ThaddeusB (talk) 04:59, 24 January 2009 (UTC)[reply]

@@ Line 61: / Line 61: @@
 :''(I.e. Will the bot work correctly?)''
 On the technical side of things, <s>I don't believe you have provided a link to source code</s> since programming has not commenced yet, I can't evaluate if it will work correctly. [[User:Foxy Loxy|<span style="color:#CC6600;">Foxy</span> <span style="color:#993300;">Loxy</span>]] [[User talk:Foxy Loxy|<sup><span style="color:#CC3333;">Pounce!</span></sup>]] 04:20, 24 January 2009 (UTC)
+:After I feel consensus is forming, I'll start the actual programming and post the code at that time. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 04:59, 24 January 2009 (UTC)
 I understand that WebCite honors noarchive instructions contained on web sites, and there are opt-out mechanisms. Is one of the requirements for the bot that it will deal gracefully with a refusal by WebCite to archive a page? --[[User:Gerry Ashton|Gerry Ashton]] ([[User talk:Gerry Ashton|talk]]) 04:43, 24 January 2009 (UTC)
+:Yes, if a page isn't archived (for whatever reason), no Wikipedia editing will be done.  If the site has opted out, it will be added to a list of opted out sites that the bot will pre-screen to avoid doing unnecessary work (submitting URLs that are sure to fail). --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 04:59, 24 January 2009 (UTC)