Talk:Bernoulli process

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
High	This article has been rated as High-importance on the importance scale.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

This article has been mentioned by a media organization:

Wes Thorne (25 September 2013). "The Spurs of Tottenham & San Antonio should form an alliance". SB Nation. Archived from the original on 25 September 2013. Retrieved 25 September 2013.

Wiki Education Foundation-supported course assignment

This article was the subject of a Wiki Education Foundation-supported course assignment, between 27 August 2021 and 19 December 2021. Further details are available on the course page. Student editor(s): GeorgePan1012. Peer reviewers: Mumtaziah, Ajp256, Leolsz.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 15:38, 16 January 2022 (UTC)[reply]

Problems

formal definition is not very formal. in fact, it doesn't state the independence at all. bad.

next, bernoulli process are usually not even required to be identically distributed, just independent.

— Preceding unsigned comment added by 140.247.149.65 (talk) 04:57, 25 October 2009‎ (UTC)[reply]

Is the Bernoulli process a sequence of random variables or a single one whose domain is the sequences in a two-element set, canonically {0,1}? The introduction and informal definition clearly say the former but the definition of Bernoulli sequence Zn implies the latter. One element omega in the sample space corresponds to the entire random sequence one sequence <X0, X1, ...>, ie the B process rather than one B trial Xi.

Either way, the formal definition seems bad to me. Does "for every omega in Omega" belong here, quantifying a probability statement, "the probability of Xi(omega)=1 with probability p ...". For every omega, Xi=1 or Xi=0 without any probability. Right? P64 (talk) 21:55, 18 February 2010 (UTC)[reply]

it is a discrete-time Lévy process. seems to be completely false : it would require that $X_{4}-X_{3}$ , $X_{3}-X_{2}$ , $X_{2}-X_{1}$ to be independant ; for instance, $(X_{3}-X_{2},X_{2}-X_{1})=(1,1)$ should have probability 1/16, while clearly the probability is 0 for $X_{3}-X_{1}=2$ never happens. I erase any mention of Levy processes in the page, until I am proved wrong. Hope it won't happen :-) --Chassain (talk) 18:05, 1 January 2010 (UTC)[reply]

older

I believe an equation associated with this is:

 tCw * (p(s)^w) * (p(f)^(t-w))

Where t is total number of trials, w is the number of successes wanted, p(s) is the probability of success, p(f) is the probability of failure. Is this correct, and perhaps added? I am not sure. -- KneeLess 04:09, 20 Sep 2004 (UTC)

That is the probability mass(weight,density,measure) at w for the binomial distribution with t trials and p(s) probability. Yes, it is reasonable to say that is all about the Bernoulli process. The article now permits a finite process. If the process must be infinite then everything binomial may be interpreted in terms of its finite subsequences such as the first t trials.

How to present this point will follow from answers to the more general winter 2010 discussion. --P64 (talk) 21:26, 22 February 2010 (UTC)[reply]

technical

The mathematics articles in Wikipedia seem as if they are all written for somebody with a mathematical or engineering background. I thought the wikipedia was written for general readers? 69.140.164.142 05:20, 27 March 2007 (UTC)[reply]

Each article is written at a level appropriate to the subject. There is no point, for example, in writing an article on motives for the general reader. However, in this case, I agree with you: an article on an important process like this should at least start in a way which is accessible to the numerate layperson. Can you improve it? Geometry guy 15:00, 13 May 2007 (UTC)[reply]

I, for one, think the level of the article is fine. The intro parts basically describe what each of the R.V represent, and the Memoryless property link seems to be fine. The main stumbling block for the real amateur is most likely the basic idea behind a "process". They can click on Stochastic Process and learn more about that.

I agree with Chassain(?, signed below) that the "process" is likely to be the stumbling concept for many readers. It makes sense that the level of articles varies in some ways among the stochastic, Markov, Bernoulli, Dirichlet, and Chinese restaurant process. At the same time the leads should provide a little more coherence and useful cross-reference. --P64 (talk) 21:26, 22 February 2010 (UTC)[reply]

application

On a unrelated note, are there practical uses for this simplistic process? If so, we should probably put that in. Akshayaj 21:21, 18 July 2007 (UTC)[reply]

Yes, if you consider mathematical applications practical. A number of important probabilistic models are based on Bernoulli processes, perhaps in disguise: For example, a one-dimensional simple random walk is defined in terms of a Bernoulli process, where each "coin flip" tells you whether to step left or right. Since any countable index set (e.g. the set of edges in a lattice) is equivalent to the integers, Bernoulli percolation is determined by a Bernoulli process in which 1s represent open edges and 0s represent closed edges. I'm sure there are other examples as well. 128.95.224.52 01:14, 19 October 2007 (UTC)[reply]

I would say, a huge number of applications, but, at the moment, I don't find many :-) It is historically important in data compression models (theoretical computer science), but Markovian sources or ergodic sources are considered more realistic. Besides percolation, there is also the Erdos Renyi model of random graphs, too.--Chassain (talk) 16:35, 3 January 2010 (UTC)[reply]

The article permits finite sequences of random variables(right?) to be processes. If that is wrong or unwise then it should be rewritten. So two coin tosses called first and second make a Bernoulli process. A single coin toss is a degenerate example.

Furthermore it is ubiquitous to formalize collections of observations as sequences. Given 20 observations "everyone" indexes them 1 to 20 although the sequence is purely formal; eg, there is no passage of time. So the Bernoulli process is the foundation of every binomial model (or application or whatever). --P64 (talk) 21:26, 22 February 2010 (UTC)[reply]

Bernoulli sequence

Is this standard usage, that the Bernoulli sequence is not a sequence of random variates zero and one but a sequence of index numbers (subset of N) where the random variate is one?

I have partly rewritten the article to fit this usage, which is a challenge, because it would be --and it has been for some other editors-- convenient to call the sequence of zero and one a Bernoulli sequence. This needs attention "urgently" inasmuch as I haven't finished rewriting for consistency.

Probably there is a big matter of pedagogical or encyclopedic tactics to resolve, which needs some group decision not simply one person's expertise in one subdiscipline. --P64 (talk) 23:11, 4 March 2010 (UTC)[reply]

I have added a statement in the main article to this effect. 24.1.53.152 (talk) 10:44, 29 January 2011 (UTC)[reply]

Bernoulli map

may be interpreted as the binary digital representation of a real number between zero and one.

First, that isn't unique. The rational numbers whose denominators are powers of 2, aka the multiples of powers of one-half, all have two binary digital representations. If we will gloss over that complication, it should at least be mentioned in a footnote. Many readers do know that 0.9999... = 1. This is a topological stumbling block. Does measure theory handle it without a glitch?

Second, this section needs to be rewritten after we decide the questions of categories, which may be a pedagogical or otherwise tactical rather than matters for deference to standard usage by experts in some subdiscipline.

When should we use the move that interprets a sequence as a single entity? (eg, a sequence of r.v., aka a discrete-time stochastic process, as a single random variable whose values are sequences of numbers)
When should we permit trials, and processes and their ilk (experiments, samples, etc) to live equally among the variates and also in the abstract space $\Omega$ ? (rather than restrict them as we restrict random variables to functions of Omega into R or Rn)

Two mistakes

The $\sigma$ -algebra defined in the "finite vs infinite" section is not a $\sigma$ -algebra as it does not contain countable intersections. Fix an infinite sequence $x=(x_{1},x_{2},\dots )$ . The countable intersection of the cylinder sets (which should also be an element of the $\sigma$ -algebra) with the first $n$ digits identical to the first $n$ digits of $x$ equals to the set $\{x\}$ . Therefore, each individual infinite sequence is measurable. One cannot just remove these elements from the $\sigma$ -algebra.

The statement about the open ball in the section "As a metric space" is not correct. The open balls are the cylinder sets. But they are also closed sets, so their complements are open and closed, too.

Patschy (talk) 10:50, 22 August 2012 (UTC)[reply]

Iterated Von Neumann extractor

I'm not at all sure the simple procedure described in the article matches the referenced paper.

The procedure defined in the article appears to produce wrong answers. Consider all possible 4-bit input strings, and apply the procedure described in the article:

Initial string	appended	output	probability
0000	000	—	p₀
0001	011	00	p₁
0010	000	1	p₁
0011	011	0	p₂
0100	100	01	p₁
0101	111	00	p₂
0110	100	011	p₂
0111	111	0	p₃
1000	000	1	p₁
1001	011	100	p₂
1010	000	11	p₂
1011	011	10	p₃
1100	100	1	p₂
1101	111	0	p₃
1110	100	11	p₃
1111	111	—	p₄

Notice how only 2 of the possible 8 3-bit output strings are returned. If there's a third output bit, it must match the second bit. That seems wrong.

The paper seems to define the iterated procedure as a recursive process:

Divide the input into bit pairs. Discard any trailing odd bit. (If there are no pairs, stop.)
Compute the exclusive-or of each pair.
For each pair where the exclusive or is 1 (the bits differ), output the first bit.
Take the string of exclusive-or bits (half the length of the input string), and perform the iterated procedure on that.
For each pair of identical bits (the exclusive or is zero), take one of them, and perform the iterated procedure on that.

(An implementation may also apply a bound to the number of recursion levels, at a cost in output length.)

So given the example bit string 10011011, the procedure would indeed generate 5 output bits, but they would be:

Output the 3 bits 101
Perform the iterated procedure on the xor bits 1110:
1. Output the bit 1
2. Perform the iterated procedure on the xor bits 01:
  1. Output bit 0
  2. Perform the iterated procedure on the xor bit 1 (produces nothing)
  3. Perform the iterated procedure on the empty bit string (produces nothing)
3. Perform the iterated procedure on the matching bit 1 (produces nothing)
Perform the iterated procedure on the matching bit 1 (produces nothing)

I've The procedure describe in the paper, tested on all possible 16-bit strings, returns a variety of bit lengths from 0 to 11, but for each output length, each output string is returned an equal number of times.

Initial string	xor	match	output	probability
0000	00	00	—	p₀
0001	01	0	00	p₁
0010	01	0	10	p₁
0011	00	01	0	p₂
0100	10	0	01	p₁
0101	11	—	00	p₂
0110	11	—	01	p₂
0111	10	1	01	p₃
1000	10	0	11	p₁
1001	11	—	10	p₂
1010	11	—	11	p₂
1011	10	1	11	p₃
1100	00	10	1	p₂
1101	01	1	00	p₃
1110	01	1	10	p₃
1111	00	11	—	p₄

Notice how every possible 1-bit output has probability p₂, and every possible 2-bit output has probability p₁+p₂+p₃.

I've done a machine check of longer strings (up to 26 bits), and they also work correctly.

Can someone else confirm the translation from source to Wikipedia is incorrect? 71.41.210.146 (talk) 18:47, 15 January 2014 (UTC)[reply]

The interpretation in the article is correct.

Note that one may arbitrarily choose the mappings of each possible pair of input bits to any combination of output and appended bit subject to the conditions that a) (11) and (00) must be mapped to complementary append bits (and no output) and b) (10) and (01) must be mapped to complementary output and append bits. Thus, you have 2 possible mappings for (11) (or (00) respectively) and 4 possible mappings for (10) (or (01) respectively), which means there are 2x4 = 8 different mapping tables which are all valid and will (on average) yield output of the same entropy even though the numerical result for any specific input may be different. The mapping table in the article and the results are valid ones. Maybe it should be stated more clearly that there are more than one possible mappings. 212.7.177.129 (talk) 09:26, 16 April 2014 (UTC)[reply]

I've found an article (http://www.drdobbs.com/parallel/algorithm-alley-unbiasing-random-bits/184405028) in which a different pocedure is described that is backed by the math from the same reference on this Wikipedia page (4, Peres 1992). It seems pretty genuine (I'm not an expert). I cite the final algorithm (you can translate 'Heads' into 0 and 'Tails' into 1).

   Function ExtractBits ( Flips[0,NumFlips-1] ) {
       NumNewFlipsA = 0;
       NumNewFlipsB = 0;
       for (j = 0; j < (NumFlips-1)/2; j++) {
           if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Tails ) {
               print 0;
               NewFlipsA[NumNewFlipsA++] = Heads;
           } 
           if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Heads ) {
               print 1;
               NewFlipsA[NumNewFlipsA++] = Heads;
           }
           if ( Flips[2*j] == Heads ) and ( Flips[2*j+1] == Heads ) {
               NewFlipsB[NumNewFlipsB++] = Heads;
               NewFlipsA[NumNewFlipsA++] = Tails;
           }
           if ( Flips[2*j] == Tails ) and ( Flips[2*j+1] == Tails ) {
               NewFlipsB[NumNewFlipsB++] = Tails;
               NewFlipsA[NumNewFlipsA++] = Tails;
           }
       }
       if (NumNewFlipsA >= 2) ExtractBits (NewFlipsA[0,NumNewFlipsA-1]);
       if (NumNewFlipsB >= 2) ExtractBits (NewFlipsB[0,NumNewFlipsB-1]);
   }

Which is the same procedure as above saying "The paper seems to define the iterated procedure as a recursive process." And thus fairly probably the "appending" procedure in the Wikipedia article is not backed by the reference (4, Peres 1992). — Preceding unsigned comment added by 2001:980:2b19:1:83b:3c72:a865:a32c (talk) 19:48, 19 November 2014 (UTC)[reply]

I have rewritten this section. This is about my first major edit on Wikipedia, so it probably needs some further revision. In particular, the current example is not a very good one, since no output bit is extracted from sequence 2 (or rather, sequence 2 is never longer than 1 bit). --Bbbbbbbbba (talk) 13:47, 16 March 2016 (UTC)[reply]