Jump to content

User:Luna Santin/RfcuParser

From Wikipedia, the free encyclopedia

RfcuParser is a simple program, useful for archiving checkuser requests and written to practice using regular expressions, among other things.

Version history

[edit]
Ver. Notes
v0.1
  • First release.
v0.1.1
  • Parses and outputs all sections of a case, for current year.
  • Fixed indicator/date loop bug.
r2
  • Rewritten in Perl.
  • Dropping old version system; no need to be so formal.
  • Dropping GUI in favor of console interface.
  • Batches: can now parse multiple cases per run.
  • Parsing: now supports more indicator and sock templates, and can be easily customized to add more.
  • Archival: can append to existing reports (updating rowspan as needed) and insert new ones (in appropriate order).
  • Will parse only the most recent section of any given case.
r3
  • Arguments: now takes a single (currently mandatory) console argument, maxdate, and will not parse cases closing after that date, or cases for which no closing date can be identified.
  • Input: now grabs all wikitext via http; will parse and file all case subpages currently transcluded on the RFCU frontpage (noting the maxdate and nodate exceptions).
  • Improved multi-line matching for indicator/date pairs (needs further testing).
  • No longer files case reports if an indicator/date pair cannot be found.
  • Case names are now based on page names directly, instead of being parsed from wikitext.
  • Fixed date formatting for single-digit numbers.
r4
  • Now uses api.php instead of query.php
  • maxdate parameter is now optional; if no date is provided, the program will parse all cases for which dates can be found
  • /o flag added to some regular expressions for minor optimization
  • Several minor fixes to indicator/date pairing; some cases had produced erroneous dates
  • Improved section-based text splitting; failure to split correctly could cause a case to be skipped
r5
  • Will no longer append to or insert a case summary, if the exact same socklist and casedate is already listed anywhere on the archive page.
  • Ignores {{checkip|Related IP address, if any}} and {{checkuser|Username of the puppet}} when building socklist.
  • Fixed an extra newline problem, when appending to some cases.
  • Allowed whitespace preceding and following indicator template names.
  • Redirected case subpages will be skipped, when parsing (the target, if transcluded, will already be listed).
r6
  • Parser will now skip any page which includes the text {{noarchive}} (currently a placeholder template).
  • Parser now recognizes {{IPblock}} as an indicator.
  • Case reports inserted in a given section's last position will now be inserted before the table's closing |}.
  • New output message notifies user if any case is skipped due to noarchive or redirect status.
r7
  • All socklists and casenames are escaped, before being used as terms in regular expressions (how I went this long without having problems with that, I may never know).
  • List of cases skipped due to {{noarchive}} or redirect status (if any) is now summarized at the end of output.
  • Support for new {{pixiedust}} indicator.
r8
  • Fixed minor newline issue, at end-of-output summary.
  • Re-organized code for (slightly) better readability.
  • Getting ready to edit via MediaWiki API.
r9

Bugs

[edit]
  •  Resolved Indicator/date parsing appears to get stuck, when the indicator text is {{declined}} or {{Declined}}. Not sure why, yet. Work around it by editing case text before parsing -- the particular indicator used is irrelevant to the parsing.
  •  Resolved Improve indicator/date pair catching over multiple lines.
  •  Resolved Date formatting tweak, for dates <10
  •  Resolved Check for errors inserting first or last report into a given casefile section.

Features

[edit]
  •  Resolved Multiple case parsing, including rowspan. Split by sections?
  •  Closed Allow manual entry for case name and date, if parsing fails?
  •  Resolved In addition to parsing cases into reports, parse reports into full archive.
  •  Resolved Regex improvement; template should not be case sensitive in some of these areas.
  •  Resolved Archive only cases which occur before a given date (console argument)?
  •  Resolved Grab text of casefile using web interface?
  •  Resolved Grab text of pages to parse using web interface?
  •  Resolved Grab list of pages to parse using web interface?
  •  Resolved Option to ignore maxdate?
  •  Resolved Check for duplicate entries, when appending?
  •  Resolved Allow for a {{noarchive}} marker?
  • Option for lengthy output, parsing all sections of all cases?
  • Option to specify case(s) to be parsed?
  • Option for alternate output format?
  • Option to ignore nodate?