Measuring site performance: charts, charts and more charts
A chart from the Signpost a fortnight ago generated by new performance charting tool Graphite (left). It demonstrated a worrying drop in the parser cache hit rate, now partially resolved (right).
Few people think of performance charts when asked what they consider the most exciting element of developing and maintaining MediaWiki wikis, but it was the area chosen by Performance Engineer Asher Feldman to be the subject of his latest post on the Wikimedia blog.
"To make targeted improvements and to identify both success and regression, we need data. Lots of data", Feldman wrote. And it certainly seems that the amount of performance-related data being collected is on the rise. Whereas previous systems "tended to mask performance issues that only surface on certain pages, or are periodic", a new system based on real-time graphing system Graphite allows thousands of data points to be tracked over time. Feldman continued, "We know we have major work ahead of us to improve performance pain points experienced by our community of editors, and [this kind of] data will guide the way".
Although not all the data collected is available to the public due to potential security concerns, a smaller set of public dashboards is now available from gdash.wikimedia.org, though certain reports will show an artificial daily drop until several imminent fixes go live. The new site complements existing pages available from high-level site status.wikimedia.org and the more detailed ganglia.wikimedia.org; those with appropriate privileges can take advantage of a detailed GUI to manipulate charts and create arbitrary new visualisations from the available data points.
Bugmeister Mark Hershberger leaves Wikimedia Foundation
Hershberger's last day will be at the end of May; as this code review backlog chart of a similar period earlier this year shows, a lot can still change before then.
Mark Hershberger will be leaving his job as Bugmeister at the Wikimedia Foundation at the end of May (wikitech-l mailing list). Hershberger had originally taken on the role as a temporary one (see Signpost coverage), but has now held it for over a year, investigating, commenting on and resolving dozens of bugs in that time. He was also influential in handling the development cycle, particularly dealing with the particularly intractable problem of slow code review.
The role gave Hershberger (and will give his successor) the opportunity to interact with dozens of different developers and Wikimedians in general, a role he appears to have mastered but which could prove the downfall of potential successors. Accordingly, public comments have been full of praise for the soon-to-be-outgoing Bugmeister, a "friendly, approachable, ... enthusiastic and cheerful" member of the Wikimedia staff, according to Director of Platform Engineering Rob Lanphier, who announced the departure. "We will miss you," wrote one developer, whilst another noted how Hershberger had turned his "uninteresting job into something actually motivating. No bug was too stupid to take care of and research". Asked for his own comment, the WMF's first Bugmeister said that of everything he had directed his energies towards during his time in the role, he was "very happy" with his work establishing full pre-deployment testing on a Wikimedia Labs-based imitation wiki—testing which resulted in several bugs in MediaWiki 1.19 being caught far earlier than they might otherwise have been.
The WMF plans to "start recruiting for a new Bugmeister soon". With such a broad area of responsibility, it could well be a tricky post to fill on a permanent basis by the time of Hershberger's actual departure at the end of May. Indeed, the hiring process will be set against an already difficult backdrop of a Git migration and wholesale changes to the Wikimedia deployment process.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for many weeks.
OAuth: allowing third-parties apps to authenticate via Wikipedia: There was a discussion this week about implementing OAuth, "a standard protocol to... provide third-party tools (web or client) with granular access to private resources... [without] revealing usernames or passwords to the third-party tool". This would allow for micro-applications to perform actions on Wikipedia, such as editing, in the same way that apps can post on a user's behalf on Twitter and Facebook. Comments were largely supportive, although there were pleas to separate a more generic system from the specific OAuth details, to allow competitor formats to be supported in future. OpenID, a distinct but related system, was also mentioned as a possible superior alternative in some scenarios.
API etiquette: Pages related to so-called API "etiquette" were updated this week to make them easier to find (wikitech-l mailing list). The pages document what reasonable usage of the API (the "machine readable" version of a MediaWiki wiki's content) looks like: single-threaded and responsive to changes in lag, with fewer requests during peak periods. Lead Platform Architect Tim Starling warned in particular against multiple connections, especially since it is so "easy to write a server-side script which accidentally allows 100 concurrent connections to Wikimedia, when 100 people happen to use it at once, or if someone decides to try a DoS attack using the Toolserver as a proxy".
/* Working again */ after bug fix: A whole subset of section titles will once again appear in edit summaries following the resolution of bug #35051. The regression-causing bug (itself introduced as part of a fix for bug #32617), which related to section titles with trailing spaces, prevented their display in edit summaries in the familiar /* Section heading */ summary format, which provides helpful section links from history pages. A fix for trailing HTML comments may also be in the pipeline.
Bugzilla statuses to evolve: The status workflow of bugs filed with Bugzilla has been changed, and may change again shortly (wikitech-l mailing list). The change sees the default state of bugs changes from "NEW" to "UNCONFIRMED", which now precedes it; "NEW" may yet be changed to the more descriptive "CONFIRMED" and further tweaks to "in progress" statuses made. Reception to the change was mixed, with consensus seeming to be that it was a positive step, albeit one that did not necessarily address the key bottlenecks in the current process.