# Talk:Bernoulli process

WikiProject Statistics (Rated B-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B  This article has been rated as B-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated B-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 B Class
 Mid Importance
Field: Probability and statistics
This article has been mentioned by a media organisation: Wes Thorne (25 September 2013). "The Spurs of Tottenham & San Antonio should form an alliance". SB Nation. Archived from the original on 25 September 2013. Retrieved 25 September 2013. (details)

## Problems

formal definition is not very formal. in fact, it doesn't state the independence at all. bad.

next, bernoulli process are usually not even required to be identically distributed, just independent.

— Preceding unsigned comment added by 140.247.149.65 (talk) 04:57, 25 October 2009‎ (UTC)

Is the Bernoulli process a sequence of random variables or a single one whose domain is the sequences in a two-element set, canonically {0,1}? The introduction and informal definition clearly say the former but the definition of Bernoulli sequence Zn implies the latter. One element omega in the sample space corresponds to the entire random sequence one sequence <X0, X1, ...>, ie the B process rather than one B trial Xi.
Either way, the formal definition seems bad to me. Does "for every omega in Omega" belong here, quantifying a probability statement, "the probability of Xi(omega)=1 with probability p ...". For every omega, Xi=1 or Xi=0 without any probability. Right? P64 (talk) 21:55, 18 February 2010 (UTC)
• it is a discrete-time Lévy process. seems to be completely false : it would require that $X_4-X_3$, $X_3-X_2$, $X_2-X_1$ to be independant ; for instance, $(X_3-X_2,X_2-X_1)=(1,1)$ should have probability 1/16, while clearly the probability is 0 for $X_3-X_1=2$ never happens. I erase any mention of Levy processes in the page, until I am proved wrong. Hope it won't happen :-) --Chassain (talk) 18:05, 1 January 2010 (UTC)

## older

I believe an equation associated with this is:

 tCw * (p(s)^w) * (p(f)^(t-w))


Where t is total number of trials, w is the number of successes wanted, p(s) is the probability of success, p(f) is the probability of failure. Is this correct, and perhaps added? I am not sure. -- KneeLess 04:09, 20 Sep 2004 (UTC)

That is the probability mass(weight,density,measure) at w for the binomial distribution with t trials and p(s) probability. Yes, it is reasonable to say that is all about the Bernoulli process. The article now permits a finite process. If the process must be infinite then everything binomial may be interpreted in terms of its finite subsequences such as the first t trials.
How to present this point will follow from answers to the more general winter 2010 discussion. --P64 (talk) 21:26, 22 February 2010 (UTC)

## technical

The mathematics articles in Wikipedia seem as if they are all written for somebody with a mathematical or engineering background. I thought the wikipedia was written for general readers? 69.140.164.142 05:20, 27 March 2007 (UTC)

Each article is written at a level appropriate to the subject. There is no point, for example, in writing an article on motives for the general reader. However, in this case, I agree with you: an article on an important process like this should at least start in a way which is accessible to the numerate layperson. Can you improve it? Geometry guy 15:00, 13 May 2007 (UTC)
I, for one, think the level of the article is fine. The intro parts basically describe what each of the R.V represent, and the Memoryless property link seems to be fine. The main stumbling block for the real amateur is most likely the basic idea behind a "process". They can click on Stochastic Process and learn more about that.
I agree with Chassain(?, signed below) that the "process" is likely to be the stumbling concept for many readers. It makes sense that the level of articles varies in some ways among the stochastic, Markov, Bernoulli, Dirichlet, and Chinese restaurant process. At the same time the leads should provide a little more coherence and useful cross-reference. --P64 (talk) 21:26, 22 February 2010 (UTC)

## application

On a unrelated note, are there practical uses for this simplistic process? If so, we should probably put that in. Akshayaj 21:21, 18 July 2007 (UTC)

Yes, if you consider mathematical applications practical. A number of important probabilistic models are based on Bernoulli processes, perhaps in disguise: For example, a one-dimensional simple random walk is defined in terms of a Bernoulli process, where each "coin flip" tells you whether to step left or right. Since any countable index set (e.g. the set of edges in a lattice) is equivalent to the integers, Bernoulli percolation is determined by a Bernoulli process in which 1s represent open edges and 0s represent closed edges. I'm sure there are other examples as well. 128.95.224.52 01:14, 19 October 2007 (UTC)
I would say, a huge number of applications, but, at the moment, I don't find many :-) It is historically important in data compression models (theoretical computer science), but Markovian sources or ergodic sources are considered more realistic. Besides percolation, there is also the Erdos Renyi model of random graphs, too.--Chassain (talk) 16:35, 3 January 2010 (UTC)
The article permits finite sequences of random variables(right?) to be processes. If that is wrong or unwise then it should be rewritten. So two coin tosses called first and second make a Bernoulli process. A single coin toss is a degenerate example.
Furthermore it is ubiquitous to formalize collections of observations as sequences. Given 20 observations "everyone" indexes them 1 to 20 although the sequence is purely formal; eg, there is no passage of time. So the Bernoulli process is the foundation of every binomial model (or application or whatever). --P64 (talk) 21:26, 22 February 2010 (UTC)

## Bernoulli sequence

Is this standard usage, that the Bernoulli sequence is not a sequence of random variates zero and one but a sequence of index numbers (subset of N) where the random variate is one?

I have partly rewritten the article to fit this usage, which is a challenge, because it would be --and it has been for some other editors-- convenient to call the sequence of zero and one a Bernoulli sequence. This needs attention "urgently" inasmuch as I haven't finished rewriting for consistency.

Probably there is a big matter of pedagogical or encyclopedic tactics to resolve, which needs some group decision not simply one person's expertise in one subdiscipline. --P64 (talk) 23:11, 4 March 2010 (UTC)

I have added a statement in the main article to this effect. 24.1.53.152 (talk) 10:44, 29 January 2011 (UTC)

## Bernoulli map

may be interpreted as the binary digital representation of a real number between zero and one.

First, that isn't unique. The rational numbers whose denominators are powers of 2, aka the multiples of powers of one-half, all have two binary digital representations. If we will gloss over that complication, it should at least be mentioned in a footnote. Many readers do know that 0.9999... = 1. This is a topological stumbling block. Does measure theory handle it without a glitch?

Second, this section needs to be rewritten after we decide the questions of categories, which may be a pedagogical or otherwise tactical rather than matters for deference to standard usage by experts in some subdiscipline.

• When should we use the move that interprets a sequence as a single entity? (eg, a sequence of r.v., aka a discrete-time stochastic process, as a single random variable whose values are sequences of numbers)
• When should we permit trials, and processes and their ilk (experiments, samples, etc) to live equally among the variates and also in the abstract space $\Omega$? (rather than restrict them as we restrict random variables to functions of Omega into R or Rn)

See also stochastic process, Markov chain, memoryless. --P64 (talk) 23:22, 4 March 2010 (UTC)

I'm not at all clear about what this complaint is about. This is standard material treated in a variety of textbooks. The real numbers embed in a Cantor space, the Bernoulli process is a Cantor space. This is not probability 101, but it's not an advanced topic either.linas (talk) 18:16, 29 July 2012 (UTC)

## Two mistakes

The $\sigma$-algebra defined in the "finite vs infinite" section is not a $\sigma$-algebra as it does not contain countable intersections. Fix an infinite sequence $x = (x_1,x_2,\dots)$. The countable intersection of the cylinder sets (which should also be an element of the $\sigma$-algebra) with the first $n$ digits identical to the first $n$ digits of $x$ equals to the set $\{x\}$. Therefore, each individual infinite sequence is measurable. One cannot just remove these elements from the $\sigma$-algebra.

The statement about the open ball in the section "As a metric space" is not correct. The open balls are the cylinder sets. But they are also closed sets, so their complements are open and closed, too.

Patschy (talk) 10:50, 22 August 2012 (UTC)

## Iterated Von Neumann extractor

I'm not at all sure the simple procedure described in the article matches the referenced paper.

The procedure defined in the article appears to produce wrong answers. Consider all possible 4-bit input strings, and apply the procedure described in the article:

Initial string appended output probability
0000 000 p0
0001 011 00 p1
0010 000 1 p1
0011 011 0 p2
0100 100 01 p1
0101 111 00 p2
0110 100 011 p2
0111 111 0 p3
1000 000 1 p1
1001 011 100 p2
1010 000 11 p2
1011 011 10 p3
1100 100 1 p2
1101 111 0 p3
1110 100 11 p3
1111 111 p4

Notice how only 2 of the possible 8 3-bit output strings are returned. If there's a third output bit, it must match the second bit. That seems wrong.

The paper seems to define the iterated procedure as a recursive process:

1. Divide the input into bit pairs. Discard any trailing odd bit. (If there are no pairs, stop.)
2. Compute the exclusive-or of each pair.
3. For each pair where the exclusive or is 1 (the bits differ), output the first bit.
4. Take the string of exclusive-or bits (half the length of the input string), and perform the iterated procedure on that.
5. For each pair of identical bits (the exclusive or is zero), take one of them, and perform the iterated procedure on that.

(An implementation may also apply a bound to the number of recursion levels, at a cost in output length.)

So given the example bit string 10011011, the procedure would indeed generate 5 output bits, but they would be:

1. Output the 3 bits 101
2. Perform the iterated procedure on the xor bits 1110:
1. Output the bit 1
2. Perform the iterated procedure on the xor bits 01:
1. Output bit 0
2. Perform the iterated procedure on the xor bit 1 (produces nothing)
3. Perform the iterated procedure on the empty bit string (produces nothing)
3. Perform the iterated procedure on the matching bit 1 (produces nothing)
3. Perform the iterated procedure on the matching bit 1 (produces nothing)

I've The procedure describe in the paper, tested on all possible 16-bit strings, returns a variety of bit lengths from 0 to 11, but for each output length, each output string is returned an equal number of times.

Initial string xor match output probability
0000 00 00 p0
0001 01 0 00 p1
0010 01 0 10 p1
0011 00 01 0 p2
0100 10 0 01 p1
0101 11 00 p2
0110 11 01 p2
0111 10 1 01 p3
1000 10 0 11 p1
1001 11 10 p2
1010 11 11 p2
1011 10 1 11 p3
1100 00 10 1 p2
1101 01 1 00 p3
1110 01 1 10 p3
1111 00 11 p4

Notice how every possible 1-bit output has probability p2, and every possible 2-bit output has probability p1+p2+p3.

I've done a machine check of longer strings (up to 26 bits), and they also work correctly.

Can someone else confirm the translation from source to Wikipedia is incorrect? 71.41.210.146 (talk) 18:47, 15 January 2014 (UTC)

The interpretation in the article is correct.

Note that one may arbitrarily choose the mappings of each possible pair of input bits to any combination of output and appended bit subject to the conditions that a) (11) and (00) must be mapped to complementary append bits (and no output) and b) (10) and (01) must be mapped to complementary output and append bits. Thus, you have 2 possible mappings for (11) (or (00) respectively) and 4 possible mappings for (10) (or (01) respectively), which means there are 2x4 = 8 different mapping tables which are all valid and will (on average) yield output of the same entropy even though the numerical result for any specific input may be different. The mapping table in the article and the results are valid ones. Maybe it should be stated more clearly that there are more than one possible mappings. 212.7.177.129 (talk) 09:26, 16 April 2014 (UTC)

I've found an article (http://www.drdobbs.com/parallel/algorithm-alley-unbiasing-random-bits/184405028) in which a different pocedure is described that is backed by the math from the same reference on this Wikipedia page (4, Peres 1992). It seems pretty genuine (I'm not an expert). I cite the final algorithm (you can translate 'Heads' into 0 and 'Tails' into 1).

   Function ExtractBits ( Flips[0,NumFlips-1] ) {
NumNewFlipsA = 0;
NumNewFlipsB = 0;
for (j = 0; j < (NumFlips-1)/2; j++) {
if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Tails ) {
print 0;
}
if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Heads ) {
print 1;
}
if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Heads ) {