User:Monkbot/Task 9: Ship infobox lists
unordered list | plainlist | ship infobox list |
---|---|---|
|
|
|
Monkbot task 9 was created to standardize lists in ship article infoboxen and operates primarily on the content of Category:WPSHIPS:Infobox list errors.
In the past, the WP:SHIPS infobox usage guide required unbulleted lists for reasons of limited available space and for aesthetics. Editors used a variety of other methods to create lists in infoboxes. These included <br />
line break HTML tags and the use of {{br}}
, {{plainlist}}
and {{unbulleted list}}
templates. Problems with these methods are:
- use of
<br />
and{{br}}
make visually 'correct' lists that are not correct for those who use screen readers. See MOS:ACCESS §Vertical lists. - limitations in Mediawiki:Common.css, prevent
{{plainlist}}
and{{unbulleted list}}
from correctly rendering multi-level lists
description
[edit]Ship infoboxen are wiki-tables that contain, at a minimum, two and usually more specialized templates that provide formatting and header data for the infobox. The templates are {{infobox ship begin}}
(required), {{infobox ship image}}
, {{infobox service record}}
, {{infobox ship career}}
, {{infobox ship characteristics}}
, and {{infobox ship class overview}}
. For the time being, this task only operates on the last three of these though it is expected that it will eventually operate on {{infobox service record}}
as well.
standardization
[edit]The task begins by standardizing the names of the infobox templates to sentence case and their canonical names if redirects are used.
setup
[edit]To constrain operation of this task to the limited area of the infobox table, task 9 hides certain characters and templates. The first step is to hide equal signs (=) in templates that are not ship infobox templates and in templates that are not either of the list templates {{plainlist}}
or {{unbulleted list}}
(or their redirect aliases) by replacing the equal signs with the text string __3QU4L__
. Similarly, the equal sign in <ref <param>=...>
, and pipes in templates and wikilinks are hidden using __3QU4L__
and __P1P3__
respectively.
Templates that are not infobox or list templates are hidden by replacing the opening and closing curly-brace pairs with __0P3N__
and __CL0S3__
respectively. Finally, list templates are hidden with __0P3N_PL_
and __CL0S3_PL_
for {{plainlist}}
, and __0P3N_UB_
and __CL0S3_UB_
for {{unbulleted list}}
.
All of this hiding make subsequent rules simpler.
line-break lists
[edit]Line break lists are the most common form of list in ship infoboxen. These list usually use some form of <br />
but occasionally use {{br}}
. These latter are first converted to <br />
. Similarly, the various forms <br>...</br>
, <BR>...</BR>
, </br>
etc. are converted to the canonical form <br />
and where more than one of these tags is present in succession, the duplicates are removed.
When the first text in an infobox template parameter is <br />
, the tag is removed. Task 9 inserts an asterisk at the start of parameter value and then replaces each occurrence of <br />
with \n*
.
plainlist
[edit]Because {{plainlist}}
templates were hidden during setup, unhide them by replacing __0P3N_PL_
and __CL0S3_PL_
with {{
and }}
. {{plainlist}}
supports the named parameters |class=
, |style=
, and |indent=
. These parameters are not supported by unordered lists in ship infoboxen.
Except for white-space, {{plainlist}}
templates must be the only text in the parameter value. Any text, even empty html comment tags (<!-- -->
), before or after a {{plainlist}}
will cause the value to be ignored. When this happens, all subsequent {{plainlist}}
templates are also ignored. It is not expected that this limitation will be 'fixed' by this tool.
When infobox parameters hold only {{plainlist}}
templates, the template markup (the {{plainlist|
and }}
) is removed along with any white-space between the parameter's equal sign and the first line of the {{plainlist}}
content.
unbulleted list
[edit]Because {{unbulleted list}}
templates were hidden during setup, unhide them by replacing __0P3N_UB_
and __CL0S3_UB_
with {{
and }}
. {{unbulleted list}}
supports the named parameters |class=
, |style=
, |indent=
, |list_style=
, |item_style=
, and |itemn_style=
. These parameters are not supported by unordered lists in ship infoboxen.
Except for white-space, {{unbulleted list}}
templates must be the only text in the parameter value. Any text, even empty html comment tags (<!-- -->
), before or after a {{plainlist}}
will cause the value to be ignored. When this happens, all subsequent {{unbulleted list}}
templates are also ignored. It is not expected that this limitation will be 'fixed' by this tool.
When infobox parameters hold only {{unbulleted list}}
templates, the template markup (the {{unbulleted list|
and }}
) is removed along with any white-space between the parameter's equal sign and the first parameter of the {{unbulleted list}}
template. The individual {{unbulleted list}}
parameters are split at the pipes into an array of strings. A new string is constructed from the array by adding *
and \n
to each array string as it is concatenated to previous strings.
miscellaneous cleanup
[edit]Items in lists within ship infoboxen often take the form
*<digit> × <thing>
sometimes with or without
on either side of ×
; sometimes an x
or ×
is used in place of ×
. Non-breaking spaces are not required at the start of a list item.
Some lists in ship infoboxen prefix a list item in the item text with • (Bullet, U+2022, •
) or · (Interpunct, U+00B7, ·
) with or without surrounding spaces. When these are found, they are removed.
restoration
[edit]All of the above tasks being completed, task 9 unhides pipes (__P1P3__), equals (__3QU4L__), and template open (__0P3N__) and close (__CL0S3__). It then assembles a summary text to be used as an edit summary. If no lists were converted, sets Skip
to true
and abandons the edit.
script
[edit]// this script converts various list forms to generic unordered list (* markup).
// The AWB list is What transcludes page and the page is Template:Infobox ship begin
// or Category:WPSHIPS:Infobox list errors
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = true; // set to true here will be set to false just before the task ends if we have converted one or more lists
// Skip = false; // for debugging
Summary = "";
string IS_INFOBOX_SHIP_BEGIN = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Bb]egin)";
string IS_INFOBOX_SHIP_IMAGE = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Ii]mage)";
string IS_INFOBOX_SHIP_CAREER = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Cc]areer)";
string IS_INFOBOX_SHIP_CHARACTERISTICS = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Cc]haracteristics)";
string IS_INFOBOX_SHIP_CLASS_OVERVIEW = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Cc]lass\s+[Oo]verview)";
string IS_INFOBOX_SERVICE_RECORD = @"(\{\{\s*(?:[Ii]nfobox\s+[Ss]ervice\s+[Rr]ecord|[Ss]ervice\s+[Rr]ecord))";
string IS_UNBULLETED_LIST = @"(?:[Uu]nbulleted\s*list|[Uu]bl|[Uu]blist|[Uu]bt|[Uu]nbullet|[Vv]unblist)";
string IS_PLAINLIST = @"(?:[Pp]lain\s*list|[Bb]ulletless list|PL|Startplainlist)";
string IS_INFOBOX_SHIP; // USE THIS AFTER INFOBOX SERVICE RECORD IS UPDATED
if (Regex.Match (ArticleText, IS_INFOBOX_SERVICE_RECORD + @"[^\|\}]*\|\s*is_ship\s*=\s*yes").Success)
IS_INFOBOX_SHIP = @"Infobox\s+(?:ship\s+(?:begin|career|characteristics|class\s+overview)|service\s+record)"; // don't do {{infobox service record}} until it is updated
else
IS_INFOBOX_SHIP = @"Infobox\s+ship\s+(?:begin|career|characteristics|class\s+overview)";
string IS_INFOBOX_SHIP_OR_LISTS = @"(?:" + IS_INFOBOX_SHIP + @"|" + IS_PLAINLIST + @"|" + IS_UNBULLETED_LIST + @")";
string pattern;
bool br_list=false;
bool plainlist=false;
bool ublist=false;
//---------------------------< I N F O B O X T E M P L A T E N A M E S >----------------------------------
// normalize infobox template names since we're mucking about in ship infoboxen, might as well do this
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_BEGIN, "{{Infobox ship begin");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_IMAGE, "{{Infobox ship image");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_CAREER, "{{Infobox ship career");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_CHARACTERISTICS, "{{Infobox ship characteristics");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_CLASS_OVERVIEW, "{{Infobox ship class overview");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SERVICE_RECORD, "{{Infobox service record");
//---------------------------< H I D E >----------------------------------------------------------------------
// HIDE TEMPLATES: find templates that are not {{infobox ship ...}} and not {{plainlist}};
// replace the equal signs in templates with __3QU4L__
pattern = @"(\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")[^\{\}]*)=([^\}]*\}\})";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__3QU4L__$2");
}
// replace the equal sign in <ref ...=...> tags with __3QU4L__ (making this rule generic is problematic)
pattern = @"(\<\s*ref[^=\>]*)=([^\|\}\>]*\>)";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__3QU4L__$2");
}
// replace the pipes in templates with __P1P3__
pattern = @"(\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")[^\{\}]*)\|([^\}]*\}\})";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__P1P3__$2");
}
// replace the pipes in wikilinks with __P1P3__
pattern = @"(\[\[[^\|\]]*)\|([^\]]*\]\])";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__P1P3__$2");
}
// replace the opening {{ with __0P3N__ and the closing }} with __CL0S3__
while (Regex.Match (ArticleText, @"\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")([^\{\}]*)\}\}").Success)
{
ArticleText = Regex.Replace(ArticleText, @"\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")([^\{\}]*)\}\}", "__0P3N__$1__CL0S3__");
}
// Hide {{plainlist}} replace the opening {{ with __0P3N_PL_ and the closing }} with __CL0S3_PL_
// do this so that {{plainlist}} closing }} doesn't hide stuff that follows
pattern = @"\{\{\s*(" + IS_PLAINLIST + @"[^\}]*)\}\}";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N_PL_$1__CL0S3_PL_");
}
// Hide {{unbulleted list}} replace the opening {{ with __0P3N_UB_ and the closing }} with __CL0S3_UB_
// do this so that {{unbulleted list}} closing }} doesn't hide stuff that follows
pattern = @"\{\{\s*(" + IS_UNBULLETED_LIST + @"[^\}]*)\}\}";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N_UB_$1__CL0S3_UB_");
}
//---------------------------< { { B R } } >------------------------------------------------------------------
// replace {{br}} with <br /> in ship info box templates
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)__0P3N__\s*[Bb][Rr]\s*__CL0S3__";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{br}}
ArticleText = Regex.Replace(ArticleText, pattern, "$1<br />");
//---------------------------< < B R > >----------------------------------------------------------------------
// replace <br> variants with <br /> in ship info box templates
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)(?:\<\s*[Bb][Rr]\s*\>|\<\s*[Bb][Rr]/\s*\>)";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is non-standard forms of <br /\>
ArticleText = Regex.Replace(ArticleText, pattern, "$1<br />");
// sometimes there are multiple <br /> tags in a row; remove all but one
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\<br /\>)\s*\<br /\>";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is <br /\><br /\>
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// now replace all remaining <space><br /><space> with __BR34K__; this should remove all newlines
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*?)\s*\<br /\>\s*";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is <br /\><br /\>
ArticleText = Regex.Replace(ArticleText, pattern, "$1__BR34K__");
//---------------------------< < B R > T O L I S T >------------------------------------------------------
// convert <br /> lists in ship info box templates with * unordered lists
// <br /> lists that contain {{para|plainlist}} or {{unbulleted list}} templates are converted but the internal
// list templates are not.
// remove a __BR34K__ tag at the beginning of a list (|Ship <parameter> =__BR34K__<value> ... becomes |Ship <parameter> =<value> ...)
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\|\s*[^\|\}]*=)__BR34K__";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is __BR34K__
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// insert a * at the beginning of a __BR34K__ list (|Ship <parameter> = <value>__BR34K__<value> ... becomes |Ship <parameter> =*<value>__BR34K__<value> ...
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\|\s*[^\|\}]*=)\s*([^\*\|][^\|\}]*__BR34K__)";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is __BR34K__
{
br_list=true;
ArticleText = Regex.Replace(ArticleText, pattern, "$1*$2");
}
// replace __BR34K__ with a newline followed by a splat; if next line starts with * the splat is replaced to prevent duplication
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)__BR34K__\*?";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is __BR34K__
ArticleText = Regex.Replace(ArticleText, pattern, "$1\n*");
//---------------------------< P L A I N L I S T >------------------------------------------------------------
// remove {{plainlist|}} template markup from the list it contains in ship info box templates
// Does not work if there is text between the parameter = sign and the opening {{. Introductory text or other
// cruft will need to be attended to by a human. When this occurs, any subsequent {{plainlist}} is ignored
// because the script can't see beyond the former's }}
//Does not work when a parameter has multiple {{plainlist}} templates
// UNHIDE plainlist: replace __0P3N_PL_ with {{
ArticleText = Regex.Replace(ArticleText, @"__0P3N_PL_", "{{");
// UNHIDE plainlist: replace __CL0S3_PL_ with }}
ArticleText = Regex.Replace(ArticleText, @"__CL0S3_PL_", "}}");
// remove plainlist named parameters if present |class=, |style=, |indent=
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=\s*\{\{\s*" + IS_PLAINLIST + @"[^\}]*)\|\s*(?:class|style|indent)\s*=[^\|\}]*([\|\}])";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// remove plainlist empty parameters if present
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=\s*\{\{\s*" + IS_PLAINLIST + @"[^\}]*)\|\s*([\|\}])";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// remove {{plainlist|}} markup when it directly follows the parameter = sign (spaces excepted) and {{plainlist}} can be followed by nothing but spaces so:
// |Ship param = {{plainlist|...}}
// |Ship param = ...
// but not other text precedes or follows {{plainlist}}:
// |Ship param = <text> {{plainlist|...}} <text>
// |Ship param = ...
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*" + IS_PLAINLIST + @"\s*\|\s*(\*\s*[^\}]*)\}\}(\s*[\|\}])"; // {{plainlist}} must follow the parameter = sign followed by nothing but spaces; must have a leading asterisk
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{plainlist|}}
{
plainlist=true;
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2$3");
}
// remove {{plainlist}} and {{endplainlist}} templates from ship info box templates
/*
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*[Pp]lainlist\s*\}\}";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{plainlist}}
{
Skip = false;
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
}
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*[Ee]ndplainlist\s*\}\}";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{endplainlist}}
{
Skip = false;
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
}
*/
//---------------------------< U N B U L L E T E D L I S T >------------------------------------------------
// remove {{unbulleted list|}} template markup from the list it contains in ship info box templates
// UNHIDE {{unbulleted list}}: replace __0P3N_UB_ with {{
ArticleText = Regex.Replace(ArticleText, @"__0P3N_UB_", "{{");
// UNHIDE {{unbulleted list}}: replace __CL0S3_UB_ with }}
ArticleText = Regex.Replace(ArticleText, @"__CL0S3_UB_", "}}");
// remove {{unbulleted list}} named parameters if present |class=, |style=, |list_style=, |item_style=, |item#_style=
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=\s*\{\{\s*" + IS_UNBULLETED_LIST + @"[^\}]*)\|\s*(?:class|style|list_style|item\d*_style)\s*=[^\|\}]*([\|\}])";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*" + IS_UNBULLETED_LIST + @"\s*\|\s*([^\}]*)\}\}";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string raw_capture = match.Groups[0].Value; // 0 - the whole match
string ret_val = match.Groups[1].Value; // 1 - start of infobox through parameter = sign
string source = match.Groups[2].Value; // 2 - the pipe-separated list items from {{unbulleted list}}
string[] items = source.Split ('|'); // create a string array of list items
foreach (string item in items)
{
ret_val = ret_val + "*" + item.Trim() + "\n"; // reassemble the list as a regular unordered list
}
return ret_val;
});
ublist=true;
}
//---------------------------< M I S C C L E A N U P >------------------------------------------------------
// remove extra blank lines from infoboxen which may be the result of {{plainlist}} removal
// doesn't work properly when there is white space between start of line and pipe
// pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)\s{1,}(\s[\|\}])";
// while (Regex.Match (ArticleText, pattern).Success)
// ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// If there are any small bullets (•·) following the * markup in an unordered list, remove them
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*)[•·]\s*";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there are small bullets
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// If there are any small bullets (•·) at the beginning of other parameter values, remove them
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*[•·]\s*";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there are small bullets
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// clean-up list items in the form * 2x something – should be 2 × something
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)\s*x\s";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × ");
// clean-up list items in the form * 2×something – should be 2 × 2 something; this one at the start only
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)[x×](\S)";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × $2");
// clean-up list items in the form * 2×2 something – should be 2 × 2 something; this one at the start only
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)[x×](\d)";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × $2");
// clean-up list items in the form * 2 × something – should be 2 × something
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)\s*×\s+";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × ");
// clean-up list items in the form * 2 × – non breaking space not required at start of list item
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+) [x×] ";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × ");
//---------------------------< U N H I D E >------------------------------------------------------------------
// UNHIDE: replace __P1P3__ with |
ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|");
// UNHIDE: replace __3QU4L__ with |
ArticleText = Regex.Replace(ArticleText, @"__3QU4L__", "=");
// UNHIDE: replace __0P3N__ with {{
ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{");
// UNHIDE: replace __CL0S3__ with }}
ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}");
if (br_list)
Summary = "line-break";
if (plainlist)
{
if ("" != Summary)
Summary = Summary + ", ";
Summary = Summary + "plain";
}
if (ublist)
{
if ("" != Summary)
Summary = Summary + ", ";
Summary = Summary + "unbulleted";
}
if ("" != Summary)
{
Skip = false; // if there is a summary here then we should not skip this page
Summary = "[[User:Monkbot/Task 9: Ship infobox lists|Monkbot task 9]]: convert " + Summary + " list(s) to unordered list(s) in ship infobox templates;";
}
else
Summary = "no list conversions";
return ArticleText;
}