From Wikipedia, the free encyclopedia
Jump to: navigation, search

WPStatistics manual of style[edit]

This manual of style is a style guide for articles within the scope of WikiProject Statistics. It focuses on the notational conventions used within the project, discusses the methods of typesetting of mathematical formulas, the recommended way of organizing the references, etc. The purpose of this manual is to attempt to unify the notation used in different articles within the project, making the exposition more streamlined and easier to follow.

Mathematical formulas[edit]

From the point of view of typesetting, there are two types of formulas which can be used in an article: displayed and inline. The “displayed” formulas are shown as a separate paragraph, intended relative to the main text. For example:

This format may be used in those cases when the formula is too large to comfortably fit inline, or when it has special importance for the exposition so that placing it in a separate paragraph provides an emphasis.

The “inline” formulas are used in case when the mathematical expression is a natural part of the text, when the formula is not very complicated, and when there is no need to emphasize it by putting into a separate paragraph. For example, in the expression above θ usually represents some angle, e is the Euler’s number, and i is the imaginary unit (in this sentence three formulas were embedded as inline expressions).

Displayed formulas[edit]

The code for displayed formulas is written in ΤΕΧ and placed within the <math>…</math> tags. The WP:MATH guide provides an excellent reference on ΤΕΧ language. In order to put a formula on a separate line, precede it with a colon, like this:

: <math>

Although technically the colon starts a new paragraph, there should be no line breaks before or after the formula unless it logically starts or ends the paragraph (typically a formula, even if displayed, is a part of a sentence). This makes to easier to follow the structure of the article text when editing.

Make sure that the displayed formula is rendered as an image. In order to force a formula to be rendered as an image you can append any white-space character at the end (for example “\,”).

Special care should be taken with displayed formulas in lists. In such case you cannot rely on Wikipedia simple bullet formatting (where each item begins on a new line with a * or a #), because Wikipedia lists do not support multi-paragraph entries. Instead the list should be written using the HTML tags: <ul><li>…<li>…</ul> (for example, see the Linear regression article).

Inline formulas[edit]

Producing nice-looking inline formulas is much more difficult. There are two possible approaches here: either use inline ΤΕΧ images, or write formulas using HTML + Wikicode. Generally, the second method is preferred, since it produces results consistent with the font and the size of the surrounding text. Use inline ΤΕΧ only in cases when the formula is too complicated to be rendered using HTML.

Another method which is sometimes encountered, is to use inline <math>…</math> environment without forcing it to render as an image. Such approach is discouraged, since it produces incorrectly typeset formulas: the font no longer matches the font of the surrounding text, the small Greek variables are rendered in upright font instead of italics, spaces are sometimes inserted in wrong places.

Incorrect: <math>e^{-i\theta}=\cos\theta-i\sin\theta</math>, which produces: .
Correct, recommended: {{nowrap|1=''e''<sup>−''iθ''</sup> = cos ''θ'' − ''i'' sin ''θ''}}, which produces: e = cos θi sin θ.
Correct, not recommended: <math style="vertical-align:-.1em">\scriptstyle e^{-i\theta}\,=\,\cos\theta\,-\,i\sin\theta</math>, which produces: .

Inline formulas using TeX[edit]

This method is generally not recommended, meaning that it should only be used in cases when inline HTML cannot be. This sometimes happens if formulas contain unusual characters, hats over symbols, difficult markup, etc. The reason why this inline ΤΕΧ format is not recommended is that it renders the output in a font slightly smaller than the standard font size of surrounding text.

Formulas in the inline ΤΕΧ format are constructed in the same way as the displayed formulas: by placing the ΤΕΧ commands inside the <math>…</math> tags.

In order to force the correct font size, the formula should start with the \scriptstyle command. Unfortunately this also alters ΤΕΧ’s spacing algorithms, so that the spaces should be inserted manually wherever necessary. For example:

Incorrect: <math>x^n+y^n=z^n</math>, which renders as:  .
Correct: <math>\scriptstyle x^n\,+\,y^n\,=\,z^n</math>, which renders as:  .

Secondly, the formulas produced by <math>…</math> tags are almost always misaligned with the text baseline. In order to correct this behavior you need to manually adjust the alignment by adding the attribute style="vertical-align:###" to the <math> tag. For example:

Incorrect: <math>\scriptstyle \hat\beta\,=\,(X'X)^{-1}X'y</math>, which renders as:  .
Correct: <math style="vertical-align:-.3em">\scriptstyle \hat\beta\,=\,(X'X)^{-1}X'y</math>, which renders as:  .

Inline formulas using HTML[edit]

Inline HTML formulas is a preferred format, since they are rendered in a consistent font and size. Additionally, they load much faster than the formulas in image format. The only drawback of this format is that it does not allow too complicated formulas, so occasionally we have to switch to ΤΕΧ.

  • HTML entities should be substituted by their corresponding symbols whenever possible: we write ''α + β − γ'' instead of ''&alpha; + &beta; &minus; &gamma;''. The possible exception to this rule are entities &nbsp; (non-breaking space) and $amp;thinsp; (thin space), since the standard editbox does not distinguish them from the common whitespace and thus they could be altered inadvertently.
  • The variable names should be italicized: A, X, z, α, θ, etc; except for the variables that use capital Greek letters — they are written in the upright font: Σ, Φ, Ω.
  • The numbers must not be italicized: 5, 1.96, −0.33, 1/2, 9.11·10−31. Also do not omit 0 before the decimal point: not .75 but 0.75.
  • Minus sign − is the symbol with Unicode code U+2212, which can be entered as an HTML entity &minus;, typed in as Alt+2212 in Windows, or inserted from the Math&Logic toolbox below the edit window. The minus sign is different from a dash -, an n-dash –, or an m-dash —. For example, type 2 − 3 = −1 but not 2 - 3 = -1.
  • Prime sign ′ can be entered as Alt+2032, or as &prime; HTML entity. Do not use ’ or ' in place of the prime symbol: y = x′β but not y = x’β.
  • Subscripts and superscripts can be added using the <sub></sub> and <sup></sup> tags. For example, ''x''<sup>2</sup> produces x2, and ''y<sub>ij</sub>'' makes yij. The difficulty arises when a symbol has to have both super- and subscripts. In those cases you can either switch to ΤΕΧ format, or use manual kerning: ''z''<sup>2</sup><sub style="position:relative;left:-.8em;top:.2em">''ij''</sub>, which renders as z2ij.
  • Generally one has to be careful regarding where the formula can break the line. In order to avoid breaks like b = (X′X)−1
    , put the entire formula inside the {{nowrap}} template: {{nowrap|1=''b'' = (''X′X'')<sup>−1</sup>''X′y''}}, which renders as b = (X′X)−1X′y. If the formula contains vertical pipe “|” characters those should be replaced with {{!}}. Alternatively, one can render the entire formula non-breaking by replacing all spaces with &nbsp;’s: |x+y| ≤ |x| + |y| can be coded as either {{nowrap|1={{!}}x+y{{!}} ≤ {{!}}x{{!}} + {{!}}y{{!}}}}, or as |x+y|&nbsp;≤&nbsp;|x|&nbsp;+&nbsp;|y|.

Notational conventions[edit]

There are many different types of notation employed in probability theory and statistics, see the “Notation in statistics” article. On Wikipedia we want to use uniform notation across articles on closely related topics, and this section attempts to list different conventions used within the articles in wikiproject Statistics.

Probability theory[edit]

  • Random variables are denoted using lower case Latin letters[note 1] from the second half of the alphabet: x, y, z.
  • Probability of an event A is denoted as Pr[A] (in <math> mode use command \Pr). Do not use bold or italic P to denote probability: ℙ[A], P[A], P[A] are all incorrect. The exception to this rule are measure-theoretic articles, which explicitly define and work with the probability space (Ω, ℱ, P). In those articles P is the probability function (other symbols may be used in place of P too, such as G or μ), so that the probability of event A should be denoted as P(A).
  • Expected value of a random variable x is denoted as E[x]. In <math> mode this can be typed as \mathrm{E}[x] or \operatorname{E}[x]. The use of square brackets is recommended, in order to indicate that “E” is not a function but an operator. Do not use italic E as in E(x).
  • Variance, covariance and correlation are denoted as Var[x], Cov[x, y], and corr[x, y] respectively.



  1. ^ Kolmogorov’s convention to use capital letters for random variables and small letters for their realizations gradually falls out of use in modern statistics, as all observations are treated as random variables and inference is performed conditionally on the realizations of those random variables.