Wikipedia:Bots/Requests for approval/GreenC bot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Green Cardamom (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 04:40, Thursday, November 3, 2016 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): GNU awk
Source code available: https://github.com/greencardamom/WebArchiveMerge
Function overview: TfM consensus to merge 4 templates into a 5th template; of which the bot will merge two, and I will manually merge the other two.
Links to relevant discussions (where appropriate): Wikipedia:Templates_for_discussion/Log/2016_October_24#Template:Wayback
Edit period(s): Periodic batch runs until complete.
Estimated number of pages affected: 100,000
Exclusion compliant (Yes/No): No
Already has a bot flag (Yes/No): Yes
Function details: Merge 2 templates into {{webarchive}}
. The 2 templates are {{wayback}}
, {{webcite}}
. The TfM also includes merger of {{memento}}
and {{cite archives}}
but for various reasons I'll be doing these manually. About 95% of the merger is {{wayback}}
the other 5% {{webcite}}
.
A typical merger will look:
- old:
{{wayback|url=http://example.com%7Cdate=20160901010101%7Cdf=y}}
- new:
{{webarchive|url=https://web.archive.org/web/20160901010101/http://example.com%7Cdate=1 September 2016}}
- old:
The bot checks dates to make sure a |date=
argument exists if otherwise missing, by decoding the date from the URL. Webcite IDs uses base62 encoding to unix-time. It preserves date formats iso, dmy, mdy and ymd. Interprets positional arguments and converts to named arguments. Converts short-form Webcite URLs to long-form per RfC, using the API.
Discussion
[edit]- Green Cardamom Please review this request - there is one conflict between the summary and the description - I don't think you mean to touch {{citeweb}}? — xaosflux Talk 10:58, 3 November 2016 (UTC)[reply]
- Fixed. Definitely don't want to merge citeweb :) -- GreenC 14:12, 3 November 2016 (UTC)[reply]
- I think the proposer meant {{WebCite}}. Also, the overview says "merge 4 templates", but the bot appears to merge two templates into a third. Minorly confusing. – Jonesey95 (talk) 12:53, 3 November 2016 (UTC)[reply]
- Yeah the other two I'm doing manually. -- GreenC 14:12, 3 November 2016 (UTC)[reply]
- Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (50 of each template). Please post results below. — xaosflux Talk 15:20, 3 November 2016 (UTC)[reply]
- Trial complete.. The trial is 50 articles containing
{{wayback}}
and 50{{webcite}}
. There is overlap with some articles containing both templates, but anyway 100 articles total.- Webcite: [1] 50 edits (Migration to Xinjiang to Les Valses de Vienne)
- Wayback: [2] 50 edits (Fetal rights to Barat College). Actually 51 edits because Fetal rights was done twice to fix garbage data.
- The new template
{{webarchive}}
has tracking categories for error checking so problems will usually show up there and those cats are clean post-trial. I also manually checked each edit and they seem OK. - -- GreenC 17:33, 4 November 2016 (UTC)[reply]
- @Green Cardamom: Why are these getting encoded in different formats? When possible, : is preferable to %3A. — xaosflux Talk 02:11, 5 November 2016 (UTC)[reply]
- Ok that's in the query portion of the string (following the "?") which requires encoding. I'm following RFC 3986. In section 2.3 the ':' is not listed as unreserved (ie. characters that should not be percent-encoded). According to section 3.4 on query strings, the '/' and '?' should be encoded, but because the "value is [often] a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters." Thus only the ':' needs to be encoded. See similar behavior with IABot.[3] -- GreenC 03:45, 5 November 2016 (UTC)[reply]
- I started a Village Pump to see if anyone has more thoughts. Wikipedia:Village_pump_(technical)#URL_encoding_colon_and_slash -- GreenC 04:51, 5 November 2016 (UTC)[reply]
- Ok that's in the query portion of the string (following the "?") which requires encoding. I'm following RFC 3986. In section 2.3 the ':' is not listed as unreserved (ie. characters that should not be percent-encoded). According to section 3.4 on query strings, the '/' and '?' should be encoded, but because the "value is [often] a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters." Thus only the ':' needs to be encoded. See similar behavior with IABot.[3] -- GreenC 03:45, 5 November 2016 (UTC)[reply]
- @Green Cardamom: Why are these getting encoded in different formats? When possible, : is preferable to %3A. — xaosflux Talk 02:11, 5 November 2016 (UTC)[reply]
- Thanks - I just noticed it looked a bit odd, ping me back after the VPT discussion runs its course. — xaosflux Talk 13:34, 5 November 2016 (UTC)[reply]
- @Xaosflux: There was a good answer there, and I will go ahead and not encode the : or / for webcitation.org queries, unless something else comes up. But this question is likely even more relevant to User:Cyberpower678's IABot which is doing thousands of new webcitation.org URLs encoding : and / (example). -- GreenC 15:43, 5 November 2016 (UTC)[reply]
- Approved for extended trial (500 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. with updated parameters. — xaosflux Talk 17:48, 5 November 2016 (UTC)[reply]
- Trial complete.
- Wayback (250): [4] (Talk:Flag of Northern Ireland to List of districts in Kerala)
- Webcite (250): [5] (Richard B. Teitelman to Big Sandy Creek (Cheat River))
- — Preceding unsigned comment added by Green Cardamom (talk • contribs)
- Thank you, I'd like to let this sit for 48 hours in the event there are any issues brought up by editors, baring none this will be approved. — xaosflux Talk 16:32, 6 November 2016 (UTC)[reply]
- Trial complete.
- Approved. Task approved. — xaosflux Talk 23:21, 8 November 2016 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.