Empirical probability

In probability theory and statistics, the empirical probability, relative frequency, or experimental probability of an event is the ratio of the number of outcomes in which a specified event occurs to the total number of trials,^[1] i.e. by means not of a theoretical sample space but of an actual experiment. More generally, empirical probability estimates probabilities from experience and observation.^[2]

Given an event $A$ in a sample space, the relative frequency of $A$ is the ratio ⁠ ${\tfrac {m}{n}},$ ⁠ $m$ being the number of outcomes in which the event $A$ occurs, and $n$ being the total number of outcomes of the experiment.^[3]

In statistical terms, the empirical probability is an estimator or estimate of a probability. In simple cases, where the result of a trial only determines whether or not the specified event has occurred, modelling using a binomial distribution might be appropriate and then the empirical estimate is the maximum likelihood estimate. It is the Bayesian estimate for the same case if certain assumptions are made for the prior distribution of the probability. If a trial yields more information, the empirical probability can be improved on by adopting further assumptions in the form of a statistical model: if such a model is fitted, it can be used to derive an estimate of the probability of the specified event

Advantages and disadvantages

Advantages

An advantage of estimating probabilities using empirical probabilities is that this procedure is relatively free of assumptions.

For example, consider estimating the probability among a population of men that they satisfy two conditions:

that they are over 6 feet in height.
that they prefer strawberry jam to raspberry jam.

A direct estimate could be found by counting the number of men who satisfy both conditions to give the empirical probability of the combined condition. An alternative estimate could be found by multiplying the proportion of men who are over 6 feet in height with the proportion of men who prefer strawberry jam to raspberry jam, but this estimate relies on the assumption that the two conditions are statistically independent.

Disadvantages

A disadvantage in using empirical probabilities arises in estimating probabilities which are either very close to zero, or very close to one. In these cases very large sample sizes would be needed in order to estimate such probabilities to a good standard of relative accuracy. Here statistical models can help, depending on the context, and in general one can hope that such models would provide improvements in accuracy compared to empirical probabilities, provided that the assumptions involved actually do hold.

For example, consider estimating the probability that the lowest of the daily-maximum temperatures at a site in February in any one year is less than zero degrees Celsius. A record of such temperatures in past years could be used to estimate this probability. A model-based alternative would be to select a family of probability distributions and fit it to the dataset containing past years′ values. The fitted distribution would provide an alternative estimate of the desired probability. This alternative method can provide an estimate of the probability even if all values in the record are greater than zero.

Mixed nomenclature

The phrase a-posteriori probability is also used as an alternative to "empirical probability" or "relative frequency".^[1] The use of the phrase "a-posteriori" is reminiscent of terms in Bayesian statistics, but is not directly related to Bayesian inference, where a-posteriori probability is occasionally used to refer to posterior probability, which is different even though it has a confusingly similar name.

The term a-posteriori probability, in its meaning suggestive of "empirical probability", may be used in conjunction with a priori probability which represents an estimate of a probability not based on any observations, but based on deductive reasoning.^[4]

References

^ ^a ^b Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). "Section 2.3". Introduction to the Theory of Statistics (3rd ed.). McGraw-Hill. ISBN 0070428646.
^ "Empirical probabilities at tpub.com". Archived from the original on 2007-05-10. Retrieved 2007-03-31.
^ Gujarati, Damodar N. (2003). "Appendix A". Basic Econometrics (4th ed.). McGraw-Hill. ISBN 978-0-07-233542-2.
^ Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). "Section 2.2". Introduction to the Theory of Statistics (3rd ed.). McGraw-Hill. ISBN 0070428646. (available online Archived 2012-05-15 at the Wayback Machine)

[Mood-1] Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). "Section 2.3". Introduction to the Theory of Statistics (3rd ed.). McGraw-Hill. ISBN 0070428646.

[2] "Empirical probabilities at tpub.com". Archived from the original on 2007-05-10. Retrieved 2007-03-31.

[3] Gujarati, Damodar N. (2003). "Appendix A". Basic Econometrics (4th ed.). McGraw-Hill. ISBN 978-0-07-233542-2.

[4] Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). "Section 2.2". Introduction to the Theory of Statistics (3rd ed.). McGraw-Hill. ISBN 0070428646. (available online Archived 2012-05-15 at the Wayback Machine)

[1]

[2]

[3]

[4]

Advantages and disadvantages

Advantages

Disadvantages

Mixed nomenclature

See also

References