Scoring rule
In decision theory a score function, or scoring rule, is a measure of the performance of an entity, be it person or machine, that repeatedly makes decisions under uncertainty. For example, every evening a TV weather forecaster may give the probability of rain on the next day, in a type of probabilistic forecasting. A viewer could note the number of times that a 25% probability was quoted, over a ten year period, and compare this with the actual proportion of times that rain fell. If the actual percentage was substantially different from the stated probability we say that the forecaster is poorly calibrated. A poorly calibrated forecaster might be encouraged to do better by a bonus system. A bonus system designed around a proper scoring rule will incentivize the forecaster to report probabilities equal to his personal beliefs.[1]
Contents |
[edit] Binary decisions
In the simple case of a binary decision, such as assigning probabilities to 'rain' or 'no rain', scoring rules may take on a simpler form. For example, Suppose we reward the forecaster with a reward
when he makes a rain statement with an attached rain probability
and
if it rains,
if it does not. Assuming that our weatherman wishes to maximise his expected reward he will choose a forecast
which maximises
where p is his personal probability that rain will fall.
[edit] Multi-class scoring rules
Scoring rules can also be used in the case where a forecaster assigns probabilities to multiple classes, such as 'rain', 'snow', or 'clear'. A forecaster will return a Probability vector r with a probability for each of the i outcomes. One usage of a scoring function could be to pay
if the ith event occurs.
All multi-class scoring rules can also be used for binary scoring by setting the number of classes C = 2.
[edit] Examples of proper multiclass scoring rules
[edit] Logarithmic scoring rule
The logarithmic scoring rule is a local strictly proper scoring rule.
Since strictly proper scoring rules remain strictly proper under linear transformation
is strictly proper for all 
[edit] Brier/quadratic scoring rule
The quadratic scoring rule is a strictly proper scoring rule.
The Brier score, originally proposed by Glenn W. Brier in 1950, can be obtained by a linear transform from the quadratic scoring rule.
Where
when the jth event is correct and
otherwise and C is the number of classes.
An important difference between these two rules is that a forecaster should strive to maximize the quadratic score yet minimize the Brier score. This is due to a negative sign in the linear transformation between them.
[edit] Spherical scoring rule
The spherical scoring rule is also a strictly proper scoring rule
[edit] Characteristics
[edit] Proper scoring rule
A scoring rule is said to be proper if it is optimized for well calibrated probability assessments. A scoring rule is strictly proper if it is uniquely maximized at this point. Optimized in this case will correspond to maximization for the quadratic, spherical, and logarithmic rules but minimization for the Brier Score.
[edit] Binary proper scoring rule
A scoring rule
is said to be proper if
is (uniquely) maximized when
for any value of
. The use of a proper scoring rule encourages the forecaster to be honest, as his expected payoff is maximized when he reports his personal rain probability
as the prediction
. Two commonly used proper score functions are:
The Brier score,[2] given by

and the logarithmic score function
-
.
[edit] Multi-class proper scoring rule
A multi-class scoring rule is said to be proper if it is maximized when r = p. A scoring rule is strictly proper when the score is only maximized when r = p.
[edit] Positive-affine transformation
A strictly proper scoring rule, whether binary or multiclass, after a positive-affine transformation remains a strictly proper scoring rule.[1] That is, if
is a strictly proper scoring rule then
with
is also a strictly proper scoring rule.
[edit] Locality
A proper scoring rule is said to be local if its value depends only on the probability
. All binary scores are local because the probability assigned to the event that did not occur is directly producible as
.
The logarithmic scoring rule is an example of a multi-class strictly proper local scoring rule.
[edit] References
- ^ a b Bickel, E.J. (2007). "Some Comparisons among Quadratic, Spherical, and Logarithmic Scoring Rules". Decision Analysis 4 (2): 49–65. doi:10.1287/deca.1070.0089. http://faculty.engr.utexas.edu/bickel/Papers/QSL_Comparison.pdf.
- ^ Brier, G.W. (1950). "Verification of forecasts expressed in terms of probability". Monthly weather review 78: 1–3. http://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf.


is strictly proper for all 


.