Definitions are inconsistent?[edit]

The article starts by calling it an accuracy measure, then calls it a measure of calibration. These are different things. For instance, if a biased coin lands heads with p=.8, I think you have the best calibration if you guess heads 80% of the time, but the highest accuracy if you guess heads 100% of the time. I'm not sure how to define "accuracy" when you guess a probability rather than an outcome. Philgoetz (talk) 22:02, 22 March 2015 (UTC)

Multiple trials are needed[edit]

The section that describes individual scores with terms like "needs work" and "not too shabby" does not emphasise enough that multiple trials are needed for a meaningful assessment. If someone predicts the chance of a coin toss resulting in heads as 50%, then the prediction is a good one. "No courage" would be a poor description if the probability really is 50%. In this case other predictions would lead to worse scores in the long run. (talk) 11:46, 25 February 2009 (UTC)

I agree, I just added some explanation about the decomposition. I think terms like "No courage" can be better lined to the components like "No resolution". I think courage is should be linked to how much the forecaster deviates from climate. Predicting 50% chance of rain in the Sahara would be quite courageous, while in the Norway it would take little courage. Maybe we should remove the terms from the example in reuse them in an example of the decomposition. Wrm researcher (talk) 17:47, 27 January 2010 (UTC)

Continuous Outcome[edit]

Recently there was an addition to the opening paragraph mentioning a continuous outcome. I believe this is incorrect and that the revision from February 26, 2012 should be reverted, but I would like someone to verify this if possible. Mickeyg13 (talk) 20:25, 22 March 2012 (UTC)

I agree that the addition about continuous outcome is incorrect. Wrm researcher (talk) 14:49, 21 June 2012 (UTC)

I 2nd that: The definition clearly states that the outcome is binary or multi-category. So the second part of the sentence is obviously wrong. I can't say wether or not there is a way of modifying the Brier Score for a continuous mease but that doesn't seem likely. — Preceding unsigned comment added by (talk) 09:50, 20 April 2012 (UTC)

Relation to MSE[edit]

It would be good to elaborate more on the relation to "mean square error" - seem mathematically equivalent, just different notions used by different communities? Xpelanek (talk) 08:30, 19 March 2014 (UTC)

Units Required in Incidental Statement[edit]

In the Example portion of the Definition of the Brier Score section there is the following reference: "In weather forecasting, a trace (< 0.01) is considered "0.0"." What are the units used here? Bananas?

Formal Notation of the Brier Score[edit]

Although named after Brier, he himself and some of his successors have named it "Probability Score", shorthanded as rather than (see Brier 1950, Sanders 1963, Murphy 1972a, 1972b, 1973). — Preceding unsigned comment added by (talk) 16:09, 11 December 2018 (UTC)

Brier was hardly likely to name the score after himself. Your references above are all over 45 years old. The overwhelming majority of recent sources call it the Brier Score and abbreviate that to , including the current article's references 3 (2011), 4 (2012), 5 (2010) and 6 (2010). Wikipedia describes current usage. --Qwfp (talk) 19:58, 11 December 2018 (UTC)
Postscriptum: You're wrong about Brier 1950. That paper repeatedly uses "the score ", and never calls it the "probability score" or denotes it by "PS". --Qwfp (talk) 20:13, 11 December 2018 (UTC)