Discrete uniform distribution

discrete uniform
	Probability mass function; n = 5 where n = b − a + 1
	Cumulative distribution function;
Notation	or
Parameters	; ;
Support
PMF
CDF
Mean
Median
Mode	N/A
Variance
Skewness
Excess kurtosis
Entropy
MGF
CF

In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution whereby a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen".

A simple example of the discrete uniform distribution is throwing a fair dice. The possible values are 1, 2, 3, 4, 5, 6, and each time the dice is thrown the probability of a given score is 1/6. If two dice are thrown and their values added, the resulting distribution is no longer uniform since not all sums have equal probability.

The discrete uniform distribution itself is inherently non-parametric. It is convenient, however, to represent its values generally by an integer interval [a,b], so that a,b become the main parameters of the distribution (often one simply considers the interval [1,n] with the single parameter n). With these conventions, the cumulative distribution function (CDF) of the discrete uniform distribution can be expressed, for any k ∈ [a,b], as

F(k;a,b)={\frac {\lfloor k\rfloor -a+1}{b-a+1}}

Estimation of maximum

This example is described by saying that a sample of k observations is obtained from a uniform distribution on the integers $1,2,\dotsc ,N$ , with the problem being to estimate the unknown maximum N. This problem is commonly known as the German tank problem, following the application of maximum estimation to estimates of German tank production during World War II.

The UMVU estimator for the maximum is given by

{\hat {N}}={\frac {k+1}{k}}m-1=m+{\frac {m}{k}}-1

where m is the sample maximum and k is the sample size, sampling without replacement.^[1]^[2] This can be seen as a very simple case of maximum spacing estimation.

The formula may be understood intuitively as:

"The sample maximum plus the average gap between observations in the sample",

the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population maximum.^{[notes 1]}

This has a variance of^[1]

{\frac {1}{k}}{\frac {(N-k)(N+1)}{(k+2)}}\approx {\frac {N^{2}}{k^{2}}}{\text{ for small samples }}k\ll N

so a standard deviation of approximately ${\tfrac {N}{k}}$ , the (population) average size of a gap between samples; compare ${\tfrac {m}{k}}$ above.

The sample maximum is the maximum likelihood estimator for the population maximum, but, as discussed above, it is biased.

If samples are not numbered but are recognizable or markable, one can instead estimate population size via the capture-recapture method.

Random permutation

See rencontres numbers for an account of the probability distribution of the number of fixed points of a uniformly distributed random permutation.

Notes

^ The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to underestimate the population maximum.

References

^ ^a ^b Johnson, Roger (1994), "Estimating the Size of a Population", Teaching Statistics, 16 (2 (Summer)), doi:10.1111/j.1467-9639.1994.tb00688.x {{citation}}: External link in |journal= (help)
^ Johnson, Roger (2006), "Estimating the Size of a Population" (PDF), Getting the Best from Teaching Statistics

Template:Common univariate probability distributions

[3] The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to underestimate the population maximum.

[Johnson-1] Johnson, Roger (1994), "Estimating the Size of a Population", Teaching Statistics, 16 (2 (Summer)), doi:10.1111/j.1467-9639.1994.tb00688.x {{citation}}: External link in |journal= (help)

[Johnson2-2] Johnson, Roger (2006), "Estimating the Size of a Population" (PDF), Getting the Best from Teaching Statistics

[1]

[2]

[notes 1]

discrete uniform
Probability mass function n = 5 where n = b − a + 1
Cumulative distribution function
Notation	${\mathcal {U}}\{a,b\}$ or $\mathrm {unif} \{a,b\}$
Parameters	$a\in \{\dots ,-2,-1,0,1,2,\dots \}\,$ $b\in \{\dots ,-2,-1,0,1,2,\dots \},b\geq a$ $n=b-a+1\,$
Support	$k\in \{a,a+1,\dots ,b-1,b\}\,$
PMF	${\frac {1}{n}}$
CDF	${\frac {\lfloor k\rfloor -a+1}{n}}$
Mean	${\frac {a+b}{2}}\,$
Median	${\frac {a+b}{2}}\,$
Mode	N/A
Variance	${\frac {(b-a+1)^{2}-1}{12}}$
Skewness	$0\,$
Excess kurtosis	$-{\frac {6(n^{2}+1)}{5(n^{2}-1)}}\,$
Entropy	$\ln(n)\,$
MGF	${\frac {e^{at}-e^{(b+1)t}}{n(1-e^{t})}}\,$
CF	${\frac {e^{iat}-e^{i(b+1)t}}{n(1-e^{it})}}$

Estimation of maximum

Random permutation

See also

Notes

References