Talk:Ziggurat algorithm

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, Low-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 
WikiProject Mathematics (Rated Start-class, Low-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
Start Class
Low Importance
 Field: Probability and statistics

Diagram needed[edit]

I added an animated diagram.Cuddlyable3 17:58, 14 September 2007 (UTC)

Basically, something like Fig. 1 on p.2 of Marsaglia's paper, or Fig. 5 from the Thomas et al. paper (linked from article References). An improvement would be to add labels "y0", "y1", etc. along the y-axis and "x0", "x1", etc. above the points.

Without that picture, the best wording is awfully hard to follow.

For a normal distribution, the function to plot is y = e-x²/2, and the 8-layer Ziggurat points on it are

x[−1]= 2.7120530222393384079012   y[−1]= 0.0000000000000000000000  —Preceding unsigned comment added by 71.41.210.146 (talk) 18:48, 1 October 2007 (UTC) 
x[0] = 2.3383716982472527044692   y[0] = 0.0649595117330380202104
x[1] = 1.9819049364005106693290   y[1] = 0.1402998180884519367696
x[2] = 1.7165081257767795707347   y[2] = 0.2291908828832851165061
x[3] = 1.4853586756432938545359   y[3] = 0.3318257830466926345376
x[4] = 1.2629701985308330330774   y[4] = 0.4504325835506035839847
x[5] = 1.0273863717802284625238   y[5] = 0.5899241094185249870813
x[6] = 0.7383689179764494421219   y[6] = 0.7614016031422429793942
x[7] = 0.0000000000000000000000   y[7] = 1.0000000000000000003253

Here's a sample gnuplot input file:

#set terminal svg
#set output "zig.svg"
unset label
set label "x-1" at 2.7120530222393384079012,0.0649595117330380202104 cent offset 0,0.5
set label "x0" at 2.3383716982472527044692,0.1402998180884519367696 cent offset 0,0.5
set label "x1" at 1.9819049364005106693290,0.2291908828832851165061 cent offset 0,0.5
set label "x2" at 1.7165081257767795707347,0.3318257830466926345376 cent offset 0,0.5
set label "x3" at 1.4853586756432938545359,0.4504325835506035839847 cent offset 0,0.5
set label "x4" at 1.2629701985308330330774,0.5899241094185249870813 cent offset 0,0.5
set label "x5" at 1.0273863717802284625238,0.7614016031422429793942 cent offset 0,0.5 
set label "x6" at 0.7383689179764494421219,1.0000000000000000000000 cent offset 0,0.5

set label "y0" at 0,0.0649595117330380202104 right offset -0.5,0
set label "y1" at 0,0.1402998180884519367696 right offset -0.5,0
set label "y2" at 0,0.2291908828832851165061 right offset -0.5,0.2
set label "y3" at 0,0.3318257830466926345376 right offset -0.5,0
set label "y4" at 0,0.4504325835506035839847 right offset -0.5,0
set label "y5" at 0,0.5899241094185249870813 right offset -0.5,-0.7
set label "y6" at 0,0.7614016031422429793942 right offset -0.5,0
set label "y7" at 0,1.0000000000000000000000 right offset -2.5,0

set xrange [0:3.5]
set yrange [0:1.05]
plot exp(-x**2/2) title "exp(-x^2/2)" with lines linewidth 2, "zig.dat" ti "" with lines

And the corresponding input file zig.dat:

0.0000000000000000000000        1.0000000000000000000000
0.7383689179764494421219        1.0000000000000000000000
0.7383689179764494421219        0.5899241094185249870813

0.0000000000000000000000        0.7614016031422429793942
1.0273863717802284625238        0.7614016031422429793942
1.0273863717802284625238        0.4504325835506035839847

0.0000000000000000000000        0.5899241094185249870813
1.2629701985308330330774        0.5899241094185249870813
1.2629701985308330330774        0.3318257830466926345376

0.0000000000000000000000        0.4504325835506035839847
1.4853586756432938545359        0.4504325835506035839847
1.4853586756432938545359        0.2291908828832851165061

0.0000000000000000000000        0.3318257830466926345376
1.7165081257767795707347        0.3318257830466926345376
1.7165081257767795707347        0.1402998180884519367696

0.0000000000000000000000        0.2291908828832851165061
1.9819049364005106693290        0.2291908828832851165061
1.9819049364005106693290        0.0649595117330380202104

0.0000000000000000000000        0.1402998180884519367696
2.3383716982472527044692        0.1402998180884519367696
2.3383716982472527044692        0.0000000000000000000000

0.0000000000000000000000        0.0649595117330380202104
2.7120530222393384079012        0.0649595117330380202104
2.7120530222393384079012        0.0000000000000000000000

71.41.210.146 16:33, 19 June 2007 (UTC)

Thanks for the image, but the caption and animation seem to imply that:
  1. The areas A under the curve are equal, and
  2. The right-hand (solid white) part is eliminated by rejection.
Neither of those are true. Each layer's black + vertical hatched regions total a constant area A (except for the base layer, which is special), and the right-hand region is eliminated by multiplying a [0,1)-distributed random point by the width of the slice xi.
I tried to edit the caption to clarify the second point, but the first is pretty hard to fix.
Also, the fact that the distribution tail is, in fact infinite, is not clear from the graphics. It's asymptotic to, but never quite reaches, the X axis.
Sorry to complain, but to illustrate it accurately, you have to demonstrate:
  1. Choose a point in a vertical interval divided evenly into 8 regions. This gives the slice number i.
  2. Map that region number, via a loojup table, onto a slice of non-uniform height and width.
  3. Choose a point x uniformly between 0 and xi−1
  4. Test if the point is less than xi, and accept x immediately if so.
  5. Otherwise, generate a random point y between yi−1 and yi and test if y < f(x). If so, accept the point. If not, restart from the beginning.
  6. (Step 5 is different in the i=0 case, but let's not try to illustrate that.)
71.41.210.146 02:06, 25 September 2007 (UTC) (originally posted on cuddlyable3's talk page)
I agree that the animation shows
1. The areas A under the curve are equal
That appears to be the case using the 8 coordinates posted here (with high precision!) by anonymous user 71.41.210.146 on 15:26 20 June 2007. The same user now says "Each layer's black and vertical hatched regions total a constant area A" implying xn(yn-yn+1) = constant. That, if true, means his coordinates are unusable. It is impossible to define the base layer where x-->infinity this way, whereas the tail area is finite.
Some other issues in the graphic:
a) The asymptotic tail of the function, as noted. I can remove a white pixel to leave a gap.
b) The left PRNG must select layers with equal probabilities. A non-linear grating suggests what would be done by a look-up table.
c)71.41.210.146 states that each output of the upper PRNG must be multiplied by the xn of the current layer (so only small wastage) while the graphic shows no conditional multiplication (hence a larger wastage). That seems to trade time to multiply every sample with time to fetch about twice as many PRNs. Cuddlyable3 16:12, 25 September 2007 (UTC)
  • It is indeed true that xn−1(ynyn+1) = constant. The base layer and the tail are handled specially with the i=0 test in step 4, as described in the text leading up to and including the "fallback algorithms for the tail" section. See particularly the description of the "fictitious x−1"; the "box" for the base layer is special: shorter than the tail, and of the same area as the base layer under the tail.
  • On modern processors, a floating-point multiply is faster than an unpredictable branch, much less a branch plus an additional random number generation (which usually involves a few multiplies itself). Also, when using more than 8 layers, the top layers are very narrow, and rejection would be much more than 50%.
If the special handling of the base layer is not clear from the text, some help improving it would be appreciated! It seems clear enough to me, but if you can point out a place where you get confused, I can try to reword that area.
71.41.210.146 08:41, 29 September 2007 (UTC)

Approximations[edit]

Normal gen approximations.gif

Any digital generator of Gaussian pseudo-random samples must make the following approximations.

    • Truncate the tails that extend to infinite sample values. Two ways to do this are:
      • see Fig. a). The result is part of a Gaussian distribution plus a uniform distribution with the same range
      • see Fig. b). The result is a distorted (when normalised) part of a Gaussian distribution
    • see Fig. c). Sample values must be quantised to fit the finite length of binary numbers in use.

The sizes of these approximations must be chosen with regard to the intended use of the samples. Thus there is no single "right" choice.

Ziggurat algorithm[edit]

The motivation behind the ziggurat algorithm is that the majority of the samples of a non-uniform distribution such as the Gaussian that would be generated by calculating a transcendental (costly) can be provided instead by combining uniform distributions from PRNGs (cheap).

The PRNGs are assigned to layers of the area under the intended distribution curve. After above approximations, the layer areas on the left of the curve should be equal. They will be selected randomly by index and over long time each layer is selected the same number of times. (One may conveniently use a maximal-length n-bit PRNG shift register to select (2n-1) layers.)

A version of the algorithm described by Marsaglia, Tsang, Doornik et al. establishes a set of points on the curve using the approximation

xn(yn+1-yn) = A constant

These are equal-area rectangles that are larger, by varying degrees, than the curve-bounded areas that they include. The disproportionality means that coresponding ranges of sample values are relatively under-represented in the output, whose distribution is distorted as shown in Fig. d). That distortion may be reduced (but not eliminated) by using more layers. The diagram shows the case of the coordinates provided here by anonymous user 71.41.210.146 .


"The dis-proportionality means that coresponding ranges of sample values are relatively under-represented in the output". I considered this issue on learning the algorithm, however I believe that the sampling is in fact not biased in this respect. The relative proportion of each rectangle that is within the distribution does not have to be the same for all rectangles to avoid bias. Each rectangle has equal area and thus equal probability of being selected, now consider rectangle A which is 100% within the distribution and rectangle B with only 50% of its area within the distribution. Clearly random points in A will be selected 100% of the time whereas random points in B will be selected 50% of the time, for points within B outside of the distribution we do not re-sample within B until we find a point (which would over-represent the distribution area within B by 2x), instead we continue to the outer loop and again randomly select a rectangle to sample from. The locster (talk) 17:04, 29 August 2011 (UTC)

Distortionless Ziggurat Algorithm[edit]

Two conditions together would allow the distortion in Fig. d) to be avoided:

    • Select layer coordinates that satisfy

erf(xn)-erf(xn+1)-yn(xn-xn+1)+xn+1(yn+1-yn)

    • Whenever a sample is rejected at a layer, try again using a new PR sample to the same layer. Due to the rarity of such cases they do not slow the algorithm significantly. Cuddlyable3 14:59, 6 October 2007 (UTC)

Image[edit]

The image of of too poor quality. It is better to remove it than keep it in the way it is. Oleg Alexandrov (talk) 01:25, 26 September 2007 (UTC)

Let's talk about how we can make it better. What are your ideas Oleg? Cuddlyable3 10:49, 26 September 2007 (UTC)
  • The asymptote of the distribution to the X axis isn't shown well. The distribution extends to infinity.
  • The areas marked "A" aren't equal area. You have to include all of the partial box (vertically hatched region) on the non-base layers, specifically including the area above the curve, to get the per-layer area A.
  • The way that layer selection is illustrated implies that the probability is proportional to the height. It's not; it's the same for each layer.
  • The way the X coordinate is chosen implies that the area to the right of the curve (solid white) is rejected. It's not; the random number is chosen to be uniform from 0 to the width of the layer.
The correct way to draw it in black and white, IMHO, is with solid white boxes under the curve and hollow boxes extending past the curve. Although I don't wish to slight the effort put into the illustration (pictographic illustration of random number generation is extremely challenging!), honestly, a non-animated illustration would do just as well.
But if you like animation, then you have to show that a layer is selected uniformly, and I'd do it like this. Rather than showing a single generation, it shows many random number generations in parallel:
  1. Start as shown, with the two-sided distribution cut in half. Excellent, just show the asymptote of the tail.
  2. Then show the boxes on top of the curve, and delete the curve everywhere except for the lowest layer.
  3. Show that the layers have equal area A.
  4. Replace the tail on the lowest layer with a box, preserving the area A.
  5. Show a square box on the left, divided horizontally into uniform layers, also each of area A.
  6. Fill the square box with random points.
  7. Morph the points onto the equal-area layers under the curve. It's just a linear scaling.
  8. Extend the right-hand margin of each box down into the box below, dividing it into two pieces (except for the top layer.) Mark the points in each box that are underneath the layer above in green (immediately acceptable). Mark the points in the right-hand portions of each box in yellow (questionable).
  9. Make the curve re-appear. On all layers except the bottom, turn yellow points under the curve green. Turn yellow points over the curve red (rejected), then fade them out.
  10. Have the yellow points on the bottom layer disappear and reappear as green points in the tail. Don't try to show "morphing" between them.
  11. Disappear the boxes and just show the curve and the randomly chosen points under it.
That actually depicts the algorithm quite faithfully.
71.41.210.146 08:26, 29 September 2007 (UTC)
Thank you for your script for a new animation. The .gif animation format is poorly equipped for morphing, and with a bigger palette for colours and fading the file size will swell.Cuddlyable3 16:15, 1 October 2007 (UTC)

Okay, then let's think. We can do it with fewer colours and make the morphing easier. You're using four colours already (black, white, pink, and transparent). Can your software handle local colour tables? You're allowed a colour table per frame if you like.

  • Rather than sliding the points, have the layers sequentially disappear from the source rectangle and appear in the ziggurat layer destination rectangle. An arrow (or better yet, lines connecting the corners of the boxes) is optional.
  • How about this colour scheme: Draw the original curve in white. Then the rectangles in colour (pink). Then the points in white. Turn the points pink when accepted, or just vanish them when rejected. Then disappear the pink rectangles and have the white curve reappear.
  • You can fade out lines by dithering, removing every fourth pixel for four frames, say.

Does that work better? 71.41.210.146 19:13, 1 October 2007 (UTC)

I released the animation to public domain so you are free to take any part of it and show your ideas here.Cuddlyable3 19:12, 2 October 2007 (UTC)
    • While improvements are made on the diagram, I'm moving it down so it doesn't dominate the article so heavily. The image should in any case come after a written introduction. jugander (t) 13:17, 7 October 2007 (UTC)
      • Why do you think the image needs to be after the text? Look at issued patents, they start with an image because that gives instant subjective enlightenment that would otherwise only come slowly from studying and visualising the text content. Cuddlyable3 (talk) 12:22, 12 February 2009 (UTC)


Flaws in Ziggurat implementation[edit]

I would like to point to the "Design Flaws in the Implementation of Ziggurat and Monty Python methods" by Boaz Nadler, there are few very important remarks about problems in the original Ziggurat works. Witek —Preceding unsigned comment added by 149.156.67.102 (talk) 22:05, 11 December 2009 (UTC)

Added to the article. (You could have done it yourself.) 71.41.210.146 (talk) 06:35, 19 January 2010 (UTC)

There is a little known paper by Wallace about using the Walsh Hadamard transform to generate the Normal distribution, which is probably the faster method. — Preceding unsigned comment added by 113.190.174.133 (talk) 00:35, 21 August 2012 (UTC)