Wikipedia:Bots/Requests for approval/ZhBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
Operator: Rjanag
Supervised or unsupervised: Supervised
Automatic or Manually assisted: Automatic
Programming language(s): Pywikipediabot
Source code available: Should be available in the pywikipediabot pages (meta:Pywikipediabot). Specifically, uses the replace.py extension.
Function overview: Replace superseded zh-related templates in Category:Chinese multilingual support templates with master template {{zh}}
, which has the functionality of all of them.
Edit period(s): One-time run
Estimated number of pages affected: 12,162, judging by the WhatLinksHere lists compiled on AWB for the various templates in that category. That may be an overestimate, though, as some of the templates in the category transclude one another and thus many pages are cross-listed at more than one WhatLinksHere.
Exclusion compliant (Y/N): N: since it's a one-time run and through article namespace, I don't see how a nobots tag would be relevant and I imagine none of these articles have one anyway. If it's an issue, though, I can work with someone to add exclusion compliants
Function details: To run the bot, I would type the following command in the python command line:
where <templatename> is replaced by the individual template to be replaced. For instance, for
replace.py -transcludes:<templatename> <templatename> {{zh|
{{zh-stp}}
, it would be This would essentially change the name of the template call while leaving all the template parameters (for instance,
replace.py -transcludes:Zh-stp {{zh-stp| {{zh|
c=中国|p=zhōngguó}}
) intact. The template {{zh}}
has already been coded in such a way that this won't cause any templates to break; it is compliant with every parameter in every one of those specific templates. A full list of all the templates that will be replaced is commented out below:
The only potential point of contention is that the new template puts simplified characters before traditional by default, and doesn't reverse them unless |first=t
is added. I have verified, though, that all the templates left in the category are already ones that put simplified first (for ones that put traditional first, I have already done them manually using AWB, and adding the |first=t
parameter as I go). It is true that there are some articles here and there which should have traditional first but don't (for example, there are some Taiwan-related articles using {{zh-stp}}
, but those are already wrong, it's not like the bot would be creating a problem where there was none. In those cases I believe it's individual editors' responsibility to fix them during the process of copyedits, content reviews, etc. (there is no deadline!) and we can never expect a bot to be able to fix them.
Discussion
[edit]A few minor points:
- I think you need quotes around the replacement parameters, like
replace.py -transcludes:Zh-stp "{{zh-stp|" "{{zh|"
. - Is it possible that an article has more than one of these templates? If so, will the bot edit these articles more than once?
- You probably need to check for the capitalized version of the templates also (for the example above, "{{zh-stp" and "{{Zh-stp").
- There might be whitespace between the template name and the | , like "{{zh-stp |". You could account for this with regular expressions, or, if there aren't many, manually replace them.
- Will the bot just work in mainspace, or in other namespaces too? (Asking partly because you mentioned that some of the templates transclude other templates on the list)
Shubinator (talk) 04:52, 25 September 2009 (UTC)[reply]
- About the quotes...that's what I thought at first, but it seems to also work without them (I tried it out on User:Rjanag/test). I think the quotes are necessary if it's done in regex, though.
- Yeah, a lot of articles have multiple templates; in that case the bot will edit the article once per template there. So, for instance, if an article has {{zh-stp}}, {{zh-c}}, and {{zh-tsp}} all in it, it would get hit three times; I didn't see any easy way to have the bot do multiple replacements in a single edit.
- The capitalization is a good point; capitalized versions are rare but I have seen them pop up sometimes. That's probably another good reason to do regex. Same with the whitespace issue. Although what I'm thinking is, after running the bot I will check the WhatLinksHere for each of the templates before I redirect them to the main one, and if there are things left over (that missed replacement due to caps, whitespace, etc.) they will probably be so few that I can replace them manually.
- About mainspace: that's a good point, I should probably limit it to mainspace. In most instances it doesn't matter if the bot makes replacements in other namespaces (for instance, some people have these things copied in their userpages or sandboxes, and it's not a big deal if the bot changes that), but there are some instruction pages like Template:Zh/doc and WP:CHINESE that use the old templates explicitly, to show the difference between the new one and the old ones, and it wouldn't make sense to replace those. (Most of them are actually
{{tlx|zh-stp}}
in the actual wikitext, though, so they wouldn't be replaced anyway; still, no need to take the risk, I suppose.) rʨanaɢ talk/contribs 05:03, 25 September 2009 (UTC)[reply]
- m:Talk:Pywikipediabot/replace.py#More than one thing to replace shows the syntax for multiple replacement. It would be an ugly command line, but it's possible. Everything else looks good. Shubinator (talk) 05:38, 25 September 2009 (UTC)[reply]
- Thanks, that looks doable. I can also add
-namespace:0
to limit it to mainspace. Other than that, I guess the only thing left to consider is whether to write a regular expression to handle the whitespace issue...but like I said above, I think those things will be so rare they should be doable by hand after the bot run is over, and that leaves one less thing to worry about the bot messing up on :) rʨanaɢ talk/contribs 05:51, 25 September 2009 (UTC)[reply] - As for the multiple replacement...I think it's probably best if I do no more than two per line (ie, the lowercase and capital version of each)...while technically possible to type all 20 replacements (40 if you count capital as well) in one command line, I think I'd be more likely to make a typo, or to lose count and therefore replace stuff with the wrong thing (for instance, if I ended up getting off by one, I could end up replacing every instance of {{zh| with {{zh-stp|, the opposite of what I want to do). So it would probably be best to just run it 20 separate times, with two replacements in each. rʨanaɢ talk/contribs 05:58, 25 September 2009 (UTC)[reply]
- Ok, sounds good. Shubinator (talk) 20:36, 25 September 2009 (UTC)[reply]
- Thanks, that looks doable. I can also add
As I understand the assorted templates have been superseded by this one master template, so this would be a clean-up operation? If that's the case, looks fine. Certainly the task should be done by a bot, rather than by hand. I think you've considered instances where problems may arise, and how to deal with them. I think coding conservatively and running multiple times is fine, particularly with a finite number of articles. I appreciate you are just doing the substitutions, not attempting to wade into the "fixing" anything in the traditional/simplified mess. I don't see particular problems and have no concerns about this bot, also Shubinator has raised good issues, and they are being dealt with appropriately. That's my opinion. --69.225.5.4 (talk) 21:35, 25 September 2009 (UTC)[reply]
- Yep, this is just a clean-up run. Actually, it won't even have any affect whatsoever on how things display in articles themselves (except that all pinyin will become italicized; this was unevenly implemented in the other templates); all it will really do is empty out the WhatLinksHere for all the old templates. rʨanaɢ talk/contribs 21:39, 25 September 2009 (UTC)[reply]
- Thanks. Looks good to me. --69.225.5.4 (talk) 21:49, 25 September 2009 (UTC)[reply]
- The display also looks good. Overall it seems fine, except that I'm not happy with having the international norm (traditional characters) being secondary by default. That's perhaps a policy issue, but retaining the multiple templates provides context if we ever decide to switch to international norms. With the templates that we would want to specify 'simplified=first' if we were to change the default order (for specifically mainland China articles and the like), might it not be a good idea to add that parameter now, since those articles will be much more difficult to identify once the conversion erases the difference? It wouldn't actually do anything, of course, unless we change the Zh template and add that functionality, but could potentially save an awful lot of slog. kwami (talk) 19:01, 26 September 2009 (UTC)[reply]
- That's true; adding a vacuous
|first=s
now could save work later if the default is changed. The main thing preventing me from doing it before its that it's extra cluttersome wikitext, but I suppose it's no more cluttersome than it is for the first=t cases. There are a lot of templates that don't have the first=s yet (since I replace several hundred by hand before I started fiddling with bots), so if there is ever a policy change to make traditional the default then what will probably need to happen is I'll have to run this bot again in the future (after getting re-approval) using some regex to add first=s to all templates that don't have any first= parameter specified. - So anyway, long story short, adding first=s now won't totally solve the problem, but I suppose it doesn't hurt to get a head start. (On the other hand, I suppose someone could argue that if the bot's going to have to make another run anyway to add that, then doing them now won't save time anyway.) Personally I guess I feel ambivalent, the end result should be the same either way. rʨanaɢ talk/contribs 19:07, 26 September 2009 (UTC)[reply]
- I agree with Rjanag that doing it now equals doing it later (or so I think you mean), so why add code if it's not required? I'm concerned doing it without community input, even though it's code, not action, and raising the issue of traditional versus simplified might simply put the bot off forever. The current task is straight-forward. In my opinion it's enough. This is based on the potential for raising discussion on a heated issue, rather than disagreement with Kwamikagami's proposal, which is insightful, if there were not such passion about the underlying issue on wikipedia. --69.225.5.4 (talk) 07:04, 27 September 2009 (UTC)[reply]
- That's true; adding a vacuous
- The display also looks good. Overall it seems fine, except that I'm not happy with having the international norm (traditional characters) being secondary by default. That's perhaps a policy issue, but retaining the multiple templates provides context if we ever decide to switch to international norms. With the templates that we would want to specify 'simplified=first' if we were to change the default order (for specifically mainland China articles and the like), might it not be a good idea to add that parameter now, since those articles will be much more difficult to identify once the conversion erases the difference? It wouldn't actually do anything, of course, unless we change the Zh template and add that functionality, but could potentially save an awful lot of slog. kwami (talk) 19:01, 26 September 2009 (UTC)[reply]
Has this been run past the folks at Template talk:Lang or anywhere else? I ask because when I visit there I always discover new depths to my ignorance. Rich Farmbrough, 02:15, 13 October 2009 (UTC).[reply]
- It hasn't, although I don't see why it would matter to them—this run won't have any effect on how {{lang}} is actually used. (Although, of course, I can always leave a message there and maybe they'll think up some effect that I wasn't aware of.) rʨanaɢ talk/contribs 02:59, 13 October 2009 (UTC)[reply]
- Rjanag did raise the issue also at Template:Zh [1]; although the Chinese languages templates are also discussed at Template:Lang, it seems editors most concerned, w/ the technical and political, from the Chinese dialects and languages, watch the Zh template, not only to weigh in on the trad/simp issue. Caution doesn't hurt with so many articles, but a larger random trial (100 edits?) through a variety of articles might be a good way for bot operator and community to see what is going on, if further discussion is needed. The trial results could be posted at both template discussion pages, if deemed useful. --69.225.5.183 (talk) 06:48, 18 October 2009 (UTC)[reply]
- I agree that a trial run is a good idea; I'll put a template here to get a BAG's attention. rʨanaɢ talk/contribs 15:15, 22 October 2009 (UTC)[reply]
- Rjanag did raise the issue also at Template:Zh [1]; although the Chinese languages templates are also discussed at Template:Lang, it seems editors most concerned, w/ the technical and political, from the Chinese dialects and languages, watch the Zh template, not only to weigh in on the trad/simp issue. Caution doesn't hurt with so many articles, but a larger random trial (100 edits?) through a variety of articles might be a good way for bot operator and community to see what is going on, if further discussion is needed. The trial results could be posted at both template discussion pages, if deemed useful. --69.225.5.183 (talk) 06:48, 18 October 2009 (UTC)[reply]
- What is/are the final command line(s) you are intending to use? Anomie⚔ 23:33, 23 October 2009 (UTC)[reply]
replace.py -transcludes:Zh-stp "{{zh-stp|" "{{zh|" "{{Zh-stp|" "{{zh|"
- For each template (so, each time replacing zh-stp with zh-cp, zh-cwtp, etc. etc.). rʨanaɢ talk/contribs 23:35, 23 October 2009 (UTC)[reply]
- Any particular reason not to do something like
replace.py -transcludes:Zh-stp -regex '{{\s*(?:(?i)Template\s*:\s*)?[zZ]h(?:-c|-cp|-cpcy|-cpl|-cpw|-cw|-cpwl|-p|-s|-sp|-st|-stp|-stpw|-t|-tp|-tpw|-ts|-tsp|-tspj|-tspw)\s*\|' '{{zh|'
Of course, adjust the list to contain exactly the ones you intend to be replacing. Anomie⚔ 00:54, 24 October 2009 (UTC)[reply]- That would also work; mainly I just figured I'd minimize the chance for making a typo at the command line if I did several short commands, rather than one big one.
- (Also, a side note... my regex has gotten a little fuzzy, can you remind me what
?:
does? ) rʨanaɢ talk/contribs 03:34, 24 October 2009 (UTC)[reply]- Perl introduced all sorts of
(?...)
constructs for various special effects, which were adopted by many other languages.(?:...)
is a non-capturing group; unlike(...)
, it doesn't create a backreference. The other unusual construct,(?i)
, makes just the group containing it be case-insensitive (Perl also has(?i:...)
as a shortcut for(?:(?i)...)
, but it seems Python 2.5 didn't pick that one up). More info is available in Perl's documentation. - Anyway, let's give it a trial. Approved for trial (35 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Anomie⚔ 04:13, 24 October 2009 (UTC)[reply]
- (Ah right, I remember now...backreferences is one of the things I skimmed over carelessly ;). )
- Also, before I start the trial, just wondering... even with the regular expressions, the current command line you've written will still have to be run several times, because it only hits -transcludes:Zh-stp (rather than hitting transclusions of all 20-odd templates). I assume there's some way to list multiple things within the -transludes parameter, but maybe it would be easier to just run one replacement at a time (at least for the trial run)? rʨanaɢ talk/contribs 04:19, 24 October 2009 (UTC)[reply]
- There may be a way, but I didn't see one in the quick glance I gave the replace.py instructions while testing my post. While the existing command line would require multiple runs with differing -transcludes values, the bonus is that if a page does contain multiple different zh-* templates then they will all be replaced the first time the page is visited rather than having to be progressively removed over multiple edits. Anomie⚔ 04:23, 24 October 2009 (UTC)[reply]
- That's a good point. Do you know if there's a way to copy & paste things in at the command line? (Generally when I do it, I only get ^V , unless it's in IDLE--but I don't know how to run pywikipediabot from IDLE.) If I can copy and paste, that would remove the risk of me making a typo from inputting a long regex. rʨanaɢ talk/contribs 15:09, 24 October 2009 (UTC)[reply]
- If you're on Linux in X11, you can normally middle-click to paste into the console. On Windows, try right-clicking in the console window or its title bar and looking in the popup menu for a paste option. No idea about OS X. At worst, you could always use any program that will save a plain text file to create a shell script or batch file. Anomie⚔ 16:00, 24 October 2009 (UTC)[reply]
- Ah, yes, right-clicking works fine on Windows for me. I'll run the bot with your regex for 35 trials, and see how it goes. rʨanaɢ talk/contribs 18:18, 24 October 2009 (UTC)[reply]
- If you're on Linux in X11, you can normally middle-click to paste into the console. On Windows, try right-clicking in the console window or its title bar and looking in the popup menu for a paste option. No idea about OS X. At worst, you could always use any program that will save a plain text file to create a shell script or batch file. Anomie⚔ 16:00, 24 October 2009 (UTC)[reply]
- That's a good point. Do you know if there's a way to copy & paste things in at the command line? (Generally when I do it, I only get ^V , unless it's in IDLE--but I don't know how to run pywikipediabot from IDLE.) If I can copy and paste, that would remove the risk of me making a typo from inputting a long regex. rʨanaɢ talk/contribs 15:09, 24 October 2009 (UTC)[reply]
- There may be a way, but I didn't see one in the quick glance I gave the replace.py instructions while testing my post. While the existing command line would require multiple runs with differing -transcludes values, the bonus is that if a page does contain multiple different zh-* templates then they will all be replaced the first time the page is visited rather than having to be progressively removed over multiple edits. Anomie⚔ 04:23, 24 October 2009 (UTC)[reply]
- Perl introduced all sorts of
- Any particular reason not to do something like
- Test run
Just completed some test runs, see Special:Contributions/ZhBot. I tried a couple different replacements (zh-stp, zh-c, and zh-cp). The only real thing I noticed is that the regex didn't seem to work--the command line tried to execute -cp
as a bash command, rather than interpreting it as part of the regex, so I kept getting an "unrecognized command" syntax error. (Could be a missing quotation mark or bracket or anything, I don't know.) So for these tests I ran it old-school, with the plain text replacements listed above, instead of using the regex replacement. rʨanaɢ talk/contribs 19:05, 24 October 2009 (UTC)[reply]
- I haven't really used Windows in a while, maybe it doesn't like the single-quotes? The trial edits look ok, but it would still be better if for example Ming Dynasty could be edited just once instead of 3 times. Anomie⚔ 20:39, 24 October 2009 (UTC)[reply]
- I can give it another try with double quotes instead of single...if that doesn't work, I'll experiment in my sandbox a bit to see if I can't figure out what's messing up the regex. rʨanaɢ talk/contribs 20:42, 24 October 2009 (UTC)[reply]
Ok, I ran another test (using double quotes instead) and it worked. I also tried editing Ming Dynasty, and it replaced multiple templates at once, as hoped. rʨanaɢ talk/contribs 05:32, 25 October 2009 (UTC)[reply]
Approved. Anomie⚔ 11:27, 28 October 2009 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.