Wikipedia:Analyzing performance issues

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This essay, "WP:Analyzing performance issues" (or "Don't worry...learn about it") describes many ways to improve the performance of articles and templates on Wikipedia. Over the past years, many important performance issues have become clear, as to better ways to significantly re-structure articles and templates. Years ago, the old essay "WP:Don't worry about performance" (WP:PERF) had been written to deter people from "over-optimizing" articles or templates for some imagined concerns.

When the RMS Titanic left England in April 1912, many people were not worried about the biggest ship on Earth. However, J.P. Morgan's mother-in-law warned him not to go, "Maiden voyages...too many things can go wrong". He sailed on another ship and lived.

However, as real concerns began to cause severe problems, that essay was repeatedly updated to note "don't worry except in this case", and that case (etc.), of severe performance issues. Unfortunately, many people kept citing "WP:PERF" as an excuse not to think about any problems, and some people were even badgered to stop talking about performance. The resulting chill eventually led to a state where even many technical admins were puzzled when performance problems occurred, after years of chill, when everyone should have been discussing performance trends and learning from each other what really mattered.

Perhaps the most obvious issue is to avoid showing very large images, except in rare cases: instead images should use unsized "|thumb" or "|upright" (for auto-sized narrow images). However, another major problem has been the stacking of 4-to-12 bottom navboxes, where eventually some articles had exceeded the 2mb (actually 2,000kb) limit of formatted text on a page. The use of bottom navboxes typically doubles (2x) the size of an article, causing the text to display twice as slow, from start to finish. Similarly, templates can be restructured to run 2x-4x smaller or faster, such as evaluating numeric-formula parameters (with #expr) before invoking a template.

Now there are several technical essays and guidelines which explain performance problems and how to avoid them:

There are many other essays about performance issues, as well. The most important point is that now it is highly encouraged to talk, learn, and worry about performance issues before an article becomes a nightmare for admins to rescue. Most admins are extremely busy, and performance problems need to be predicted before they occur, so that there will be fewer last-minute crisis situations.

Analyzing template performance[edit]

The major reason that performance is such a critical issue for templates is due to the severely limited resources allowed for formatting each article. Originally, resource limits were set small to prevent "run-away" formatting of large articles which might bog down the Wikipedia webservers for other users. However, the limits are still small, in the sense that an article cannot be a medium-sized "pamphlet" with formatted tables and charts. Instead, large articles need to be split into multiple subarticles.

When an article is formatted, the MediaWiki preprocessor (NewPP) embeds resource statistics in the formatted page markup source (viewed in a web browser by <View><Source> or such). The following shows a sample of resource statistics, typically about 90 lines down in a formatted article's HTML markup source.

NewPP limit report
Preprocessor node count: 693/1000000
Post-expand include size: 819/2048000 bytes
Template argument size: 1504/2048000 bytes
Highest expansion depth: 13/40
Expensive parser function count: 0/500

Of those resources, the most troublesome of them, when using templates, is the "post-expand include size" limit of 2,000 kb (or 2,048,000 bytes), during 2009-2012. Every time a template generates text inside an article, the length of that template-formatted text is added to the total post-expand size (the size of other text on a page is not added, only text and parameters passed inside templates will be counted).

However, another critical resource (not listed in the NewPP report before 2011), is the MediaWiki "expansion depth" (limit=40 during 2010-2012) of the nested if-else logic and nested templates. The limit can be quickly reached when templates are coded with deeply nested if-else-else-else-else logic, which is typical for computer programmers who commonly nest logic perhaps over 200 levels deep in computer software. Ironically, the use of if-else-else-else structures is an efficient way to select one of several options, without processing the other cases, and can make templates run many times faster; however, when a template uses else-else-else and then invokes another template using else-else-else, the expansion depth quickly deepens. The severely limited depth, of only a mere 40 levels (for all templates nested together), often requires extensive planning, or redesign, of large templates. The tiny expansion limit also restricts the use of complex templates: see essay "WP:Avoiding MediaWiki expansion depth limit" for more details.

[ This essay is a quick draft to be vastly expanded later. ]

See also[edit]