Talk:Mersenne Twister

From Wikipedia, the free encyclopedia
  (Redirected from Talk:Mersenne twister)
Jump to: navigation, search
WikiProject Cryptography / Computer science  (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Cryptography, a collaborative effort to improve the coverage of Cryptography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as High-importance).
WikiProject Computing (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
edit·history·watch·refresh Stock post message.svg To-do list for Mersenne Twister:
  • Discuss comparison with other notable generators - specifically Linear Congruentials
  • Discuss *why* it isn't suitable for cryptography
  • Explain what all the pseudocode actually means, and some of the thinking behind the steps.
  • Graphics - an illustration showing the effects of the tempering steps would be nice
  • Start working on the Generalised feedback shift register article.
  • Add an article about the new variant of Mersenne Twister.
  • Resolve the conflict in the published literature as to MT19937's ability (or inability) to pass ALL BigCrush TestU01 tests (e.g. linear complexity)

Reference needed[edit]

For the technically interested somebody should give a reference to the original publication.

—Preceding unsigned comment added by (talk) 18:42, 22 March 2002

—Preceding unsigned comment added by (talk) 01:40, 23 March 2002


What does "equidistributed in 623 dimensions" mean? AxelBoldt, Saturday, March 30, 2002

Well... It means that if you take the outputs and use them to pick points in 623-dimensional space, you get a statistically uniform distribution in the unit hypercube (assuming you pick between 0 and 1). I would have thought that was obvious? If not, perhaps some parenthetical clarification is indicated?
—Preceding unsigned comment added by The Ostrich (talkcontribs) 10:02, 29 May 2002
Perhaps better: that all sequences of bits of length up to 623 digits appear equally often in the output. Acanon 23:30, 1 Sep 2004 (UTC)

No. It is *not* merely "all sequences of up to 623 output *bits*". It's the much stronger "all sequences of up to 623 output *values*, where each value is 32 bits".

(In contrast, a 65 bit LFSR has equidistribed all sequences of only 2 output 32 bit values. The third 32 bit output value of that LFSR is heavily biased. The fourth value can always be predicted from the previous 3 with 100% accuracy.)

--DavidCary 8 July 2005 23:46 (UTC)

"twist" is for long period, "temper" is for equidistribution.
but I can't say it well in English. 12:04, 5 February 2006 (UTC) M.Saito


Perhaps some person should include a description of the algorithm?

—Preceding unsigned comment added by (talk) 16:34, 27 September 2004

Mistake in code?[edit]

Within the 'k' indexed 'for' loop the pseudocode uses 'i' instead of 'k'

—Preceding unsigned comment added by (talk) 08:58, 13 March 2006

Also 'index' / 'i' in the final function.
I've corrected both of these.
—Preceding unsigned comment added by El10t (talkcontribs) 17:20, 13 March 2006

cryptography and the MT[edit]

i added a sentence about why the MT can't be used for cryptography — observing the sequence of iterates for long enough allows one to predict or independently calulate all future iterates. dunno if this is sufficient. Lunch 01:23, 16 March 2006 (UTC)

I do not understand what does that mean exactly. Does this prediction require tto know that the MT is used (and no other PRNG)? How is the observation acutually done, how must the numbers been analysed? And why doesn't this contradict the equidistribution?--SiriusB 14:56, 14 May 2007 (UTC)
There's nothing odd about predictable but still equidistributed. As a very simple example, consider the sequence 1, 2, ..., 500, 1, 2, ..., which repeats indefinitely. This is an equidistributed sequence (each value from 1-500 appears the same number of times) but if I were to tell you the current number in the sequence, you would immediately be able to give the next number in the sequence. —Lowellian (reply) 20:58, 22 May 2007 (UTC)
OK, but this far from being a random-like distribution. Let's say it in other words: "Random-like" and "predicrable" should contradict each other. But the random-like distribution of the MT is said to be among the best of all, so how can a random-like distribution still be predictable? Or is it only predictable if one knows that the MT is used (and how it works, of course) and the predictability is reduced to finding the correct seed?--SiriusB 10:57, 23 May 2007 (UTC)
First of all, "random-like" and "predictable" are not contradictory. A good pseudorandom number generator seems "random-like," but by definition of its pseudorandomness, is also predictable if you understand the generating algorithm and have the seed. If you observe about 600-something values of MT, it is possible to determine where in the MT sequence you are, and from there determine future values. —Lowellian (reply) 03:10, 27 May 2007 (UTC)

Changed seeding algorithm![edit]

Was: MT[i] := last_32bits_of(69069 * MT[i-1])

Is now: MT[i] := last_32bits_of((69069 * MT[i-1]) + 1)

There are 2^19937 possible states for the MT19937 generator and only one of them is unacceptable - all zeros. The old LCG will itself produce an endless run of zeros if the seed is 0. Moreover, I believe it is a mangled version of a VAX generator. See [1]

—Preceding unsigned comment added by (talk) 17:23, 27 June 2006


The C implementation of MT takes a 32-bit value as a seed. Does anyone know how to choose seeds that will generate sequences that don't overlap (for, say, 2^64 values) in the 2^19937-1 long full sequence? Or one can be "reasonably sure" that all those 2^32 seeds generate disjuct sequences? Any results published on this? Andras 09:54, 12 April 2006 (UTC)

Yes, there are positive results from Matsumoto & Nishimura themselves (see web site):
Makoto Matsumoto and Takuji Nishimura, "Dynamic Creation of Pseudorandom Number Generators", Monte Carlo and Quasi-Monte Carlo Methods 1998, Springer, 2000, pp 56--69
—Preceding unsigned comment added by (talk) 08:17, 27 March 2007

the previous comment is incorrect. The paper which is cited above introduces a different generator. For MT19997 there are no known methods to guarantee non overlap, but the probability for them to overlap is very small. Kotika98 (talk) 06:42, 18 December 2013 (UTC)

The "twist" and equidistribution[edit]

The article says: "The "twist" is a transformation which assures equidistribution of the generated numbers in 623 dimensions (linear congruential generators can at best manage reasonable distribution in 5 dimensions)" I think this is probably wrong. The equidistribution in 623 dimensions is also attained by a linear congruential generator provided it is as big as Mersenne Twister, and that it is based on a primitive polynomial. In fact all the widely discussed good properties of Mersenne Twister are shared by a such a construction. There are a few good things the "twist" achieves but they are very subtle (perhaps too much so for an encyclopedia article). 02:49, 28 April 2006 (UTC)


The article says: "It is faster than all but the most statistically unsound generators." Not really. It is quite fast. It is difficult to make a generator this good that runs faster. But a lagged fibonacci generator with sufficiently large lags can run quite a lot faster, and has good enough properties for many purposes. A lagged fibonacci generator with 4 taps (instead of the usual 2) is likely at least as fast as MT and behaves quite well. The construction used in SPRNG with two lagged fibonnaci generators combined with right shift and XOR is also very fast and very good. None of these have the proveable nice properties of the twister, but in practice i would not call them "most statistically unsound".

BTW the free source code distributed by the original inventors of MT is kind of acceptibly quick, but could be a lot faster if optimised, especially for a particular machine. But even when optimised it will be a lot slower than lagged fibonacci. 02:49, 28 April 2006 (UTC)

My thoughts: 1) Several variations are faster, and this is even noted in the article.

2) "most statistically unsound generators" is extremely ambiguous. Does this refer to the most statistically unsound generators in common use? Are we comparing it to the 17 most unsound generators in common use, or the 2 most unsound? Are we including conceivable generators that have never been used?

3) Even if true and unambiguous, this is [peacock language] and a violation of Wikipedia standards. A better statement would be "it runs in O log N time" (or whatever time it runs in). You could even compare this to the time of other commonly used generators. —Preceding unsigned comment added by Bvbellomo (talkcontribs) 21:21, 13 November 2007 (UTC)


The proveable equidistribution in 623 dimensions may be a red herring. It only applies totalled over the whole period of the generator. The period is so huge that it is unlikely to be used in any real application. It is actually possible to achieve the same guarantee for some really bad generators.

Here's one: take a unsigned binary number with 623*32 = 19936 bits. Initialise it to some value as the seed. Return successive 32 bit sections from the number as pseudo-random return values. When the end of the number is reached, increment the number and start back at the first 32 bit section. Over the whole period, this has equidistribution in 623 dimensions. Over any data set you would use in practice, it sucks.

Mersenne Twister seems to be very well behaved in practice, but this has very little to do with the theoretical guarantee. 02:49, 28 April 2006 (UTC)

What you described is equidistributed in exactly 623 dimensions; from what I understand this algorithm has proven equidistribution in up to 623 dimensions, a much stronger properly which isn't accomplished by your algorithm. -- 15:27, 13 May 2007 (UTC)

Revision Update[edit]

There is a new revision (2002/1/26) by the original authors of MT. We need to update this article to reflect the changes:

—Preceding unsigned comment added by (talk) 03:15, 5 October 2006

(Yes, please. The previous initialization, the one that the article mentions, had severe problems; see the explanation (Google's cache) and the new code (Google's cache). The function to revise is init_genrand().) --pgimeno 13:23, 2 January 2007 (UTC)

Also, we need more explanations about the code. I'm a little unclear of a particular for-loop used in this new revision:

k = (N>key_length ? N : key_length);
for (; k; k--) { ... }

What does this mean? k is initialised to either N or key_length, and used as the default for the empty parameter in the for-loop, but the second parameter is also k. From the code, it looks like as if he's decrementing k, but with the same initial and end conditions. The for-loop is the same as

for(k;k;k--) {...}

I don't make any sense of this. Can anyone explain? Thanks.

—Preceding unsigned comment added by (talk) 03:15, 5 October 2006

It's equivalent to:
k = max(N, key_length);
while(k != 0) {
-- 19:16, 16 October 2006 (UTC)
That's correct. The second parameter in "for(k;k;k--)" is treated as a Boolean and will return true as long as it is not zero. —Lowellian (reply) 21:02, 22 May 2007 (UTC)

I don't undestand...[edit]

I'm somewhat puzzled by the "applications" paragraph. What does it mean that 624 observations allows one to predict the future iterates? Are there any known issues with some seeds? (Please try to be simple, I'm not really in those things)
MaxDZ8 talk 14:32, 26 October 2006 (UTC)

Tempering function is bijective. So inverse function exists. So we can rebuild internal vectors from 624 outputs. Then we can run MT algorism to get next sequence. 01:11, 14 March 2007 (UTC)

I believe I got it. Thank you.
MaxDZ8 talk 06:59, 14 March 2007 (UTC)

Pseudo code[edit]

MT generates 624 numbers at once, but it's for speed. Pseudo code should show algorithm clearly without considering speed. I don't know the grammer of pseudo code, so I show it in C:

   unsigned long genrand_int32(void)
       unsigned long y;
       static unsigned long mag01[2]={0x0UL, MATRIX_A};
       /* mag01[x] = x * MATRIX_A  for x=0,1 */
       mti = mti % N;
       y = (mt[mti] & UPPER_MASK) | (mt[(mti + 1) % N] & LOWER_MASK);
       mt[mti] = mt[(mti + M) % N] ^ (y >> 1) ^ mag01[y & 0x1UL];
       y = mt[mti];
       /* Tempering */
       y ^= (y >> 11);
       y ^= (y << 7) & 0x9d2c5680UL;
       y ^= (y << 15) & 0xefc60000UL;
       y ^= (y >> 18);
       return y;

—The preceding unsigned comment was added by (talk) 04:32, 27 March 2007 (UTC).

There is no grammar for pseudo-code, it's just supposed to be readable by all human programmers but not by a computer program. It should look like Pascal but you're free to use things like shift 5 bits to the left() while this is clearly a syntax error in Pascal or C. (talk) 11:44, 12 August 2008 (UTC)


When the article talks about the period, it states that

"in practice, there is little reason to use larger ones, as most applications do not require 2^19937 unique combinations".

This seems like misinformation. On the Pseudorandom Number Generator page, it says that 2^19937 combinations is

"likely far more than the number of computations which can be performed within the entire future existence of the universe)".

I think this statement should be rephrased. 19:09, 10 April 2007 (UTC)

Which statement do you think should be rephrased, the former or the latter? I see no problem with the former statement. It is perfectly true that most applications do not require 2^19937 unique combinations. If it is the latter statement you see a problem with, you should discuss it on Talk:Pseudorandom number generator, not here, since that is the page on which the statement appears. —Lowellian (reply) 20:47, 22 May 2007 (UTC)
I suggest saying:

"This period exceeds the need of any practical application." Cuddlyable3 10:39, 27 June 2007 (UTC)

Not at all. The period of MT makes many practical applications unsuitable. For example the python function random.shuffle is limited due to the period of MT. [2] , [3] Shabda 09:26, 9 August 2007 (UTC)
That's not a practical limitation. It's easier to see that with randomly generated text than permutations: most texts can never come out of a PRNG, because the set of texts is too big (even for fairly short texts). But you'll never notice unless you're patient enough to wait for monkeys to write a Shakespeare play, which might take you a few universe lifetimes, and then cry foul when the allegedly random monkeys start repeating themselves from the beginning (at which point you notice that they were actually just robots using a PRNG). All a bigger period would do is put off that point, and with infinite (i.e no) period the point would never come, but if there's nothing else wrong with the PRNG then this is not much of a limitation. It's similar to secure hashes, which are used to e.g. give "unique" 256-bit tags to arbitrary bit strings, even though it's plain as day to everyone that the tags can't possibly be unique (by the pigeonhole principle). None of it matters in practice, as if this is the only problem with a secure hash, then the day you produce a counterexample we'll be busy dealing with the flying pig pest problem. -- Coffee2theorems 00:01, 10 August 2007 (UTC)
Excuse me, what the referenced post is saying is that lists of more than 2080 elements cannot be shuffled with MT19937 such that every permutation occurs at least once (probably less because some will occur more than once). While this is technically true, how does this have anything to do with a practical application? You might as well say that 5000 digits of Pi are not enough for approximating a circle. Sounds more like a thought experiment to me than a practical application. Aragorn2 (talk) 11:04, 10 February 2016 (UTC)

Unlike Blum Blum Shub, the algorithm in its native form is not suitable for cryptography. Observing a sufficient number of iterates (624 in the case of MT19937) allows one to predict all future iterates. Combining the Mersenne twister's outputs with a hash function solves this problem, but slows down generation.

I think the last sentence in the quote above is overly broad and vague. Will combining the Mersenne twister's with any hash function make it suitable for cryptography? For what types of cryptographic applications will it be suitable? Are there certain crptographic applications for which the combination (Mersenne twister + hash function) would be easy to attack? How sould the Mersenne twister be combined with the hash funciton?

Dubious advantages[edit]

The article claims as an advantage that 3. It is faster than all but the most statistically unsound generators. Well, the GSL timing data at show at least two faster generators, neither of which have shown any statistical defect (or I haven't unearthed a reference to that effect). So can someone please document the claim? If not, it'll be removed in a week.

According to Jackel (2002) Monte Carlo Methods in Finance, the Mersenne Twister, "is in fact faster than most other reliable pseudo-random number generators" and "no slower than any of the other pseudo-random generators". (p. 75) (perhaps a small exaggeration in a casual discussion) Measure for Measure 03:39, 11 November 2007 (UTC)

I'll also note that the advantage given as "4. It passes numerous tests for statistical randomness, including the stringent Diehard tests." is somewhat tautological. Most currently used generators pass "numerous" tests as well, even the Diehard ones. With that in mind, (4) is like saying that the advantage of a particular kind of car is that it "demonstrably moves you from one place to another". When they all share the same advantage, what is the advantage?

The field of PRNG historically has some statistically poor generators, e.g. Linear congruential generator, so it is worth noting good generators' good statistical properties as an advantage. Cmcqueen1975 (talk) 02:08, 16 September 2014 (UTC)

There is also a lack of disadvantages here. MT19937 has a very large amount of state compared to many other generators, and there is no efficient "stepping" ability (issues noted by l'Ecuyer). mdf 19:22, 14 September 2007 (UTC)

The extensively used program Matlab has a number of built in random number generators. The default one has a period of 2^1492, and as far as I know it is reliable and would have been tested extensively because of the extensive use of Matlab in simulation work requiring random numbers. Matlab (v7) also has the option to use MT as the random number generator. In testing it, I found that the MT was about 30% slower than the default random number generator. Since the Matlab built in has a period long enough for most (well ... probably ANY) purposes, there doesn't seem to be any real advantage (apart from curiosity value) in using MT option. Indeed, Matlab only offers the MT option for the rand() function that gives uniform deviates - the MT option is not available for the randn() function, which generates Gaussian deviates. Alan1507 12:33, 15 September 2007 (UTC)

My timing tests agree with yours in R2007a (7.4) but the MathWorks people think there's enough to MT that it's the MATLAB default for RAND as of this version. Still not available in RANDN. Patrick O'Leary 17:38, 4 October 2007 (UTC)
Before you start removing stuff, let's discuss this a bit more. I gather from your timing table that the fastest three are, in order, taus, gfsr4 and mt19937. I take it mt19937 is this one. My question: do taus and gfsr4 pass all the same tests that mt19937 passes? If so, which tests have you applied? Did you apply any tests that some algorithms pass, but others fail? Can you provide a reliable source to back up your claims? 04:13, 14 October 2007 (UTC) reports tests for a number of other generators of this ilk. See table I, starting at about page 28. A few of these are faster than MT19937, and all report comparable results. Take care to note that the cut-off for failure is extremely small in these tests. (To paraphrase Linus Torvalds: anyone who thinks a generator that fails a statistical test at the 1.0e-10 level is 'statistically unsound' may well be smoking crack ;-/). Which is to say that the comments from Jackel, quoted by Measure for Measure above, are a better reflection of reality. mdf (talk) 19:05, 19 November 2007 (UTC)
To mdf: What exactly do you mean with "no efficient 'stepping' ability"?--SiriusB 09:56, 14 November 2007 (UTC)
To jump ahead so-and-so many states, without actually having to compute every intermediate state. For F2 linear generators (like MT19937), this amounts to a simple matrix multiply, and the matrix can be computed quickly in a small amount of space, even for extremely large steps. L'Ecuyer mentioned, in a few of his many papers, the problem with MT19937 having such a huge state makes the stepping issue hopeless (consider 19937-sized matrices, and O(n^3) algorithms). However, when trying to relocate them to answer your question, I immediately came across Efficient Jump Ahead for F2-linear Random Number Generators, in which Haramoto, Matsumoto, Nishimura and L'Ecuyer effectively solve the MT19937 stepping problem. mdf (talk) 19:05, 19 November 2007 (UTC)

I should have read the discussion before making changes, I had no idea this was such a live issue. Please feel free to revert if I have misunderstood. However, as a nonspecialist, I'd suggest "fast" might be a sufficient comment for this article? Servalo (talk) 00:32, 29 November 2007 (UTC)

Examples of users[edit]

Here is a start on a list of examples of users of the Mersenne twister. When it gets to be a decent size it can go in the article. Kaldosh 09:32, 25 September 2007 (UTC)

  • GT Concept HD for PS3.
  • Wikipedia's "Random article" feature. [4] --KyleWild 18:55, 30 October 2007 (UTC)
  • Matlab uses the Mersenne Twister algorithm as the default random number generator [5] Reddevyl (talk) 18:38, 13 April 2011 (UTC)


This is the current pseudocode in the article:

 // Create a length 624 array to store the state of the generator
 var int[0..623] MT
 var int y
 // Initialise the generator from a seed
 function initialiseGenerator ( 32-bit int seed ) {
     MT[0] := seed
     for i from 1 to 623 { // loop over each other element
         MT[i] := last_32bits_of(1812433253 * (MT[i-1] bitwise_xor right_shift_by_30_bits(MT[i-1])) + i)

 // Generate an array of 624 untempered numbers
 function generateNumbers() {
     for i from 0 to 623 {
         y := 32nd_bit_of(MT[i]) + last_31bits_of(MT[(i+1)%624])
         if y even {
             MT[i] := MT[(i + 397) % 624] bitwise xor (right_shift_by_1_bit(y))
         } else if y odd {
             MT[i] := MT[(i + 397) % 624] bitwise_xor (right_shift_by_1_bit(y)) bitwise_xor (2567483615) // 0x9908b0df
 // Extract a tempered pseudorandom number based on the i-th value
 // generateNumbers() will have to be called again once the array of 624 numbers is exhausted
 function extractNumber(int i) {
     y := MT[i]
     y := y bitwise_xor (right_shift_by_11_bits(y))
     y := y bitwise_xor (left_shift_by_7_bits(y) bitwise_and (2636928640)) // 0x9d2c5680
     y := y bitwise_xor (left_shift_by_15_bits(y) bitwise_and (4022730752)) // 0xefc60000
     y := y bitwise_xor (right_shift_by_18_bits(y))
     return y

The definition of pseudocode is: a compact and informal high-level description of a computer programming algorithm that uses the structural conventions of programming languages, but omits detailed subroutines, variable declarations or language-specific syntax. The programming language is augmented with natural language descriptions of the details, where convenient, or with compact mathematical notation. Well, this code is indeed not language-specific, but it is not compact and not informal. It is much bigger and less readible than the equivalents in most languages ("right_shift_by_11_bits" and then assuming % for modulo without explanation???), Since the majority of users of such code are probably going to be C/C++ or FORTRAN users, I propose that it is converted into working C code:

// Create a length 624 array to store the state of the generator
// Most numbers are unsigned 32-bit integers
unsigned long int MT[624];
unsigned long int y;
int current_idx; // points to the current MT output number

// Generate an array of 624 [0..623] untempered numbers
void MTgenerateNumbers() {
  int i;
  for (i = 0; i <= 623; ++i) {
    y = ((MT[i] & 0x8000000) >> 31) // take 32nd bit
      + ( MT[(i+1) % 624] & 0x8fffffff); // AND with last 31 bits
    if (y & 1)
      // y even
      MT[i] = MT[(i + 397) % 624] ^ (y >> 1);
      // y odd
      MT[i] = MT[(i + 397) % 624] ^ (y >> 1) ^ 2567483615ul; // 0x9908b0df; ul indicates unsigned long in C
  current_idx = 0;

// Procedure to initialise the generator from a 32-bit seed
void MTinitialise(long unsigned int seed) {
  int i;
  MT[0] = seed;
  for (i = 1; i <= 623; ++i) // loop over remaining elements
    // & is a binary AND, ^ is a binary XOR, and >> a bit shift
    MT[i] = 0xffffffff & (1812433253 * (MT[i-1] ^ (MT[i-1] >> 30)) + i);

// Extract the next tempered pseudorandom number
unsigned long int MTrandom() {
  // call generateNumbers() again if the array of 624 numbers is exhausted
  if (current_idx >= 624)
  y = MT[current_idx];
  y = y ^ (y >> 11);
  y = y ^ ((y << 7) & 2636928640ul); // 0x9d2c5680
  y = y ^ ((y << 15) & 4022730752ul); // 0xefc60000
  y = y ^ (y >> 18);
  ++current_idx; // increment index
  return y;

Testing it with seed=1 gives this for the first few numbers:

  0 2773488264
  1 3990704388
  2 3757302560
  3 1411641254
  4 2593950992
  5  307370599
  6 2147185445

If that is not correct, then the pseudocode might be ambiguous. For example it is not clear to me whether 32nd_bit_of(MT[i]) means (MT[i] >> 31) or (MT[i] & 0x8000000). Han-Kwang (t) 01:07, 21 November 2007 (UTC)

Ambiguity is bad, and should be fixed. I have no problem with working C/C++ code, though this may encourage optimization, and complaints about "Wikipedia sucks because it doesn't compile with BlazerSnort++ 6.1!". Test-vectors is indeed a good idea, but this brings in "original research" and "howto" issues. If that is moot, then it would be good to give the above, plus for states 624 - 630. mdf (talk) 02:20, 21 November 2007 (UTC)
It seems that there are variations on the initialization routine; e.g. this reference implementation produces different numbers when seeded with 1. I don't want to put in a piece of compilable C code unless I'm really sure that it is correct. A few comments above, there is a C snippet which might be better as an example. Unfortunately, it is incomplete: what are MATRIX_A, UPPER_MASK, LOWER_MASK, N? How is the mt array initialized? Han-Kwang (t) 19:20, 25 November 2007 (UTC)

Implementation Links[edit]

All right. While I feel, that a link per programming language is ok, I also think that we should get some sort of order or hierarchy into that list. I suggest an alphabetical order. Opinions? --Gulliveig (talk) 14:22, 25 November 2007 (UTC) In May of 2012, I somewhat alphabetized the list. It bugged me. -- Muricula Mus — Preceding unsigned comment added by (talk) 00:31, 1 June 2012 (UTC)

What is CR?[edit]

"The Mersenne twister is a pseudorandom number generator linked to CR developed in 1997 by Makoto Matsumoto" What is "CR"?

What is WELL?[edit]

The section about SFMT compares it to WELL in terms of speed... I found some information on - in the PDF it says that WELL stands for a family of "Well Equidistributed Long-period Linear” random number generators. Also, it is not clear to me why it is relevant in the discussion of SFMT, and not mentioned earlier. Are WELL generators significant enough to be mentioned at all? --Cynebeald (talk) 08:23, 17 January 2008 (UTC)

Moreover, what does "v-bit accuracy" mean? It cannot be expected by the reader to guess (or even know) what a v-bit is and how it is related to random number generators.--SiriusB (talk) 14:16, 31 March 2008 (UTC)
WELL is developed in 2006(5?). It has beter equidistribution property than MT and has longer period than MT. WELL is a competitor of SFMT as a successor to MT. —Preceding unsigned comment added by (talk) 03:14, 29 September 2008 (UTC)


I'm nominating this page for the greatest understatement in all of Wikipedia award, more specifically for this statement:

"most applications do not require 219937 unique combinations (219937 is approximately 4.315425 × 106001)."

Suppose I turn every elementary particle in the visible universe into a computer that produces 109 combinations a second, and I let all those computers run for the time that has elapsed since the Big Bang. Call the number of combinations all of them have now produced N1. Next, take N1 computers producing N1 combinations per second, and let those run for the same time. Call their total output in number of combinations N2. Under the same conditions let N2 computers turn out N2 combinations per second to generate N3 combinations during the age of the universe upto now, and let N3 computers do N3 combinations per second to get to N4, on the same way go from N4 to N5 and from N5 to N6. Then N6 is still much smaller than 106001... - Andre Engels (talk) 15:43, 3 October 2008 (UTC)

I agree completely, that statement seemed really silly and I went into the talk section to say basically the same thing you've said. I don't know why nobody ever wants to make definitive statements. It is true to say "no application will ever require 219937 unique combinations", so why not just say it? —Preceding unsigned comment added by (talk) 21:03, 1 June 2009 (UTC)
People make statements like this about random number generators all the time. In my opinion, such statements are worse than useless. The number of particles in the universe has no relation to the quality of a random number generator, at least, not in the sense implied by these statements. Only with relatively old or weak generators is exhausting the sequence an issue. With modern generators, the main issue with period is the implications that the period has on output quality. Statements in this context discussing the number of particles in the universe or similar quantities mislead people into thinking that this is a pertinent topic. ATBS 11:53, 9 December 2009 (UTC)ATBS —Preceding unsigned comment added by ATBS (talkcontribs)
I moved the comparison to a footnote and removed the silliness.--agr (talk) 15:05, 9 December 2009 (UTC)

Doesn't it pass all test in TestU01?[edit]

The article says that MT doesn't pass every test in TestU01 (just: most of them). However, [6] suggests that MT passes every test is TestU01, see Table 2 on Page 12. Or am I just interpreting something wrong in the linked paper...? —Preceding unsigned comment added by (talk) 11:55, 11 January 2010 (UTC)

While there are conflicting reports of MT19937's ability to pass all BigCrush tests in TestU01, I believe the evidence points to it NOT passing all tests in TestU01. For example, MT19937 has indeed been shown to PASS ALL TESTU01 BigCrush tests by two independent highly respected researchers. (1) ●See B. McCullough. "A Review of TESTU01". Journal of Applied Econometrics. Vol. 21 No. 5, pp 677-682. 2006. [7]. (2) ●See also B. Wichmann & I. Hill, "Generating Good Pseudo-Random Numbers". Computational Statistics & Data Analysis. Vol.51 No. 3, pp 1614-1622. December 2006. [8].

However, MT19937 was reported in 2007 by L'Ecuyer & Simard to FAIL two tests in the extensive BigCrush suite using Version 1.0 of TESTU01 (3) ●See P. L'Ecuyer, R. Simard, "TestU01: A C Library for Empirical Testing of Random Number Generators". ACM Transactions on Mathematical Software. Vol. 33 No. 4, article 22, pp 22:1 – 22:40. August 2007. [9].

Subsequently, Melard reported that a personal communication with Simard in 2011 revealed that the two tests which FAILED (with Ver 1.0 of TESTU01) were Linear Complexity (r=1) and Linear Complexity (r=29), with a p-value <10E-15. (4) ●See G. Melard, "On the Accuracy of Statistical Procedures in Microsoft Excel 2010". Computational Statistics. Vol.29 No. 5, pp 1095-1128. October 2014. [10].

DETAILS: Using Ver 1.1 of TESTU01, McCullough in 2006 reported that his testing of MT19937 (as implemented in 'R' ver 1.6.1) PASSES ALL BigCrush tests. Wichmann & Hill in 2006 also reported that MT19937 PASSES ALL BigCrush tests of TestU01. Wichmann & Hill cite "Version 6.0, dated Jan 15, 2005" of TestU01 for their testing (which Wichmann has confirmed). Although the current version is only 1.2.3 (dated Aug 18, 2009), the version 6.0 of TestU01 that Wichmann & Hill used was a "pre-official-release" version in 2005. When TestU01 was officially released (published) in 2007, the numbering system was reset to 1.0. At any rate, Melard in 2011, citing the 2006 McCullough review (which used Ver 1.1 of TestU01), also reports MT19937 to PASS all BigCrush Tests. IMPORTANT: L'Ecuyer has posititively confirmed that MT19937 will FAIL TestU01 tests for linear comlexity (as provided in ANY & ALL versions of TestU01; he states the software changes were only minor). L'Ecuyer indicates that MT19937, like all F2-linear PRNG's will indeed FAIL these linear complexity tests in TestU01. Regarding MT19937, L'Ecuyer et al have stated it “...successfully passed all the statistical tests included in the batteries SmallCrush, Crush, and BigCrush of TestU01, except those that look for linear dependencies in a long sequence of bits, such as… the linear complexity tests… This is in fact a limitation of all F2-linear generators, including the Mersenne Twister… Because of their linear nature, the sequences produced by these generators just cannot have the linear complexity of a truly random sequence. This is definitely unacceptable in cryptology… but is quite acceptable for the vast majority of simulation applications if the linear dependencies are of long range and high order.” ●See F. Panneton, P. L’Ecuyer, M. Matsumoto. "Improved Long-Period Generators Based on Linear Recurrences Modulo 2". ACM Transactions on Mathematical Software. Vol. 32 No. 1. pp 1-16. March 2006.[11] SUMMARY: Both McCullough and Wichmann & Hill are in direct conflict with the published results from L'Ecuyer & Simard, regarding MT19937's ability to pass all BigCrush tests in TestU01. McCullough and Wichmann & Hill clearly published that their testing of MT19937 passed all BigCrush tests in TestU01. This conflict is NOT attributable to the version of TestU01 employoed. This issue needs to be resolved. If MT19937 does pass ALL Big Crush tests, this fact should be listed under "Advantages", rather than "Disadvantages" in this Wikipedia article. But this does NOT appear to be the case, as L'Ecuyer and Simard have confirmed the failures of MT19937 and they are the authors of the TestU01 software and some of (if not the) the foremost experts in the world on testing PRNG's. Nevertheless, this conflict in the published literature is troubling, especially given the ubiquitous nature of MT19937. Nitrorat (talk) 17:43, 30 March 2015 (UTC)

Pseudo code clarification[edit]

One of the comments says, "loop over each other element" which to me means skip over every other element, but that's not what the code appears to do (It appears to iterate over every element). Qutorial (talk) 19:35, 26 December 2010 (UTC)

Done. — Preceding unsigned comment added by (talk) 20:35, 16 December 2014 (UTC)

Pseudocode (again)[edit]

One issue with the current pseudo code is that it doesn't adequately explain that right shifts should shift in zeroes rather than performing the more common sign-extension. In C/C++ this is accomplished by making the various int variables unsigned. However, if someone naïvely implements based on the current pseudo-code, they will (probably) get answers inconsistent with the reference specification ( (talk) 20:48, 22 August 2011 (UTC)

algorthmic detail[edit]

This section is bloody useless unless you have a degree in math.

I remember this page from a while back - I was able to read it and implement the twister. Now I cannot. (talk) 18:11, 7 January 2012 (UTC)

I have a degree in math and still consider it close to useless. The terms are defined so tersely that they're often incomprehensible: I don't know what "middle word, or the number of parallel sequences" means, for instance.
There should at least be a "birds-eye view" of the algorithm somewhere in the article. From the current section, I'm able to glean two phases. First, a linear recurrence relation is used with some bit shifting and other operations thrown in (why is a mystery). The linear operator is a companion matrix, so is in rational normal form. Why this is important is unclear; speed of computation is hinted at. The operations are taken over the field on two elements, so addition is bitwise XOR. There is a second pass where the numbers generated in the first pass are fiddled with in order to get nicer randomness properties. There are a bunch of constants involved that have been chosen presumably for their nice properties. (talk) 18:55, 9 February 2012 (UTC)


I changed:

         int y := 32nd bit of(MT[i]) + last 31 bits of(MT[(i+1) mod 624])


         int y := (MT[i] & 0x80000000) + (MT[(i+1) mod 624] & 0x7fffffff)

because the description, although word-oriented, is confusing, and basically wrong. The 32nd bit of a number is ambiguous because the bit numbering scheme isn't stated, and what you want is that bit in its shifted state. Otherwise, the "32nd bit" is 0 or 1, which is not what's wanted. The change was reverted, and I'm not going to change it back, at least not yet, but I doubt anyone is going to be able to code it properly from the current wording. I couldn't. My wording might be C-oriented, but at least it's accurate. (talk) 21:50, 31 October 2012 (UTC) — Preceding unsigned comment added by (talk) 21:46, 31 October 2012 (UTC)

Hmmmm.... could we do both? I'm thinking of your change being the pseudocode and the old pseudocode being a comment. (I don't disagree with anything you said, except that I found the wordy version to be clear enough given the context and my background. I also found your version to be equally clear, which probably means that I'm a poor judge of pseudocode that's intended to be read by less experienced coders.) GaramondLethe 22:33, 31 October 2012 (UTC)
Both would be fine, in my opinion. (talk) 04:23, 2 November 2012 (UTC)

Take a look and see what you think. I'm not sure there's a good way of guiding a reader who doesn't understand bitmasks through this algorithm. Feel free to improve on this. GaramondLethe 06:22, 2 November 2012 (UTC)


The Lua implementation listed in the article is written in C. You can use the function in Lua though. SharkD  Talk  22:53, 8 September 2013 (UTC)

Appalling algorithm description[edit]

On the wikipedia, algorithm descriptions are NEVER given in English - e.g. an actual explanation - they are given as highly mathmatical proofs, which are unreadable to all except qualified mathematicians, i.e. almost nobody. I'd change it, but invariably such changes are reverted. There is something broken about wikipedia culture in this regard. (talk) 08:06, 26 July 2014 (UTC)

Statement about periodicity[edit]

The statement "(a period above 2512 is enough for any application)" in Mersenne twister#Disadvanages looks like a dubious claim to me. Monte Carlo methods often require RNG's with a very long period; also, the Fisher-Yates shuffle requires very long periods for shuffling all but the smallest of sets - see this for reasoning. -- 2001:470:BB6F:0:DDB4:997F:D54D:AE09 (talk) 02:53, 27 August 2014 (UTC)

The statement is valid, see e.g. Numerical Recipes, §7.1: "Avoid using generators with period > 10100". The discussion that you cited, from the Fisher-Yates Shuffle, just argues that 262 ≐ 1017 is too small, which is indeed true. That is a long way, though, from 10100 ≐ 2333. (talk) 15:08, 27 August 2014 (UTC)
2 ^ 512 is a huge number and there is no way any application will ever generate or need that many numbers. My source is Applied Cryptography, which talks about a 512 bit key being large enough for any imaginable situation. Samboy (talk) 12:01, 28 August 2014 (UTC)
I agree. I restored the article, and included the reference to Numerical Recipes. If you have the full reference to Applied Cryptography (e.g. section or page number), perhaps it is worth adding? (talk) 15:58, 28 August 2014 (UTC)
I lost my copy of Applied Cryptography years ago when I loaned it to an ex-roomate, so I can’t be more specific. However, Schneier goes down a similar path in this 1999 essay, where he states “we cannot even imagine a world where 256-bit brute force searches are possible” Samboy (talk) 17:42, 28 August 2014 (UTC)
My reading of that link is that it is referring to cryptographically-secure keys for public-key cryptography. That is relevant for a CSPRNG, but not for a general PRNG, which is what the Mersenne Twister aims to be. (talk) 18:41, 28 August 2014 (UTC)


The claim looks atrociously dubious, to me, too.

2512 is indeed quite a big number, but it's not combinatorially big. What I mean by that is the following simple argument: how many permutations of 150 distinct elements are there? A little more than 2872. Will I obtain one of those permutations uniformly at random by plugging a PRNG with period 2512 into a Fisher-Yates-like shuffle? Nope: quite a lot of permutations will never happen; only a subset of those fitting into the period will; that is not a uniform shuffle. By the same pigeonhole principle, we can't get uniform numbers in [1..100] from a generator with period 10, no matter how hard we try. There's just not enough entropy available.

And what if I wanted to shuffle 2000 items, rather than just 150? By the way, a Mahjong deck consists of 136 tiles; with certain tiles considered indistinguishable, that gives around 2763 shuffles possible. So for an internet server of that game, a 2512-period PRNG is practically not good enough: literally most shuffles, while having non-zero probability of happening in an IRL game, would be never dealt on the server.

Obviously, this happens because the factorial function (giving the number of permutations) grows even faster than exponentially (which is already damn fast). In many computations, one will need to generate samples from exponentially big spaces (random trees of height n would be an example). In such computations, period of mere 2512 might not give enough "resolution" to pick samples densely enough. -- (talk) 01:19, 2 February 2016 (UTC)

Can't verify §7.1 citation[edit]

I also have an objection regarding the quality of "Numerical Recipes" citation. It seems I can't verify it. The whole §7.1 contains neither the "Avoid using generators with period > 10100", neither the article quote, nor any other recommendation to avoid big periods, as I read it. The closest passage I find is "If you are generating more than 100,000,000 random numbers in a single calculation... [use ran2]" — but its nowhere near the claim in the article allegedly supported by the citation. I presume that it's a misunderstanding, and not an intentional citation abuse.

The citation, as well as the claim itself, is going to be removed. Anyway, it brings not much value into the article besides fuelling talkpage debates like this one and several others above it. -- (talk) 01:19, 2 February 2016 (UTC)

This article is very good[edit]

Good job. A really good article. (talk) 17:15, 6 September 2014 (UTC)

algorithmic detail vs. psuedocode[edit]

in line 10 of the psuedocode, the value 1812433253 (0x6c078965) appears. There's nothing in the algorithmic detail explaining where this value comes from (talk) 16:13, 22 October 2014 (UTC))

It's there because whoever "contributed" that pseudocode were plagiarising a paper that in turn didn't explain that the number came from Matsumoto et al's Cryptographic Mersenne Twister and Fubuki Stream/Block Cipher, 2005 (Link); the latter chose it for no special reason except that it should be a good multiplier for a Linear Congruential Generator. Cryptarch (talk) 04:36, 20 March 2015 (UTC)

By far, the most widely used PRNG?[edit]

There is this claim in the lead. While it has good uptake I would suspect the simple linear congruential generator would have higher use. There are the standard one used in Java[12] and older C implementations, .NET[13]. --Salix alba (talk): 07:58, 12 November 2014 (UTC)

Requested move 6 March 2015[edit]

The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: moved per request. Unopposed for two weeks. Favonian (talk) 13:06, 21 March 2015 (UTC) Favonian (talk) 13:06, 21 March 2015 (UTC)

Mersenne twisterMersenne Twister – Most academic sources (see and ) capitalise its second word. --Relisted. EdJohnston (talk) 00:09, 14 March 2015 (UTC) cmɢʟeeτaʟκ 18:44, 6 March 2015 (UTC)

  • Info: All instances of "Twister" in the article's body are capitalized and appear to have been so since sometime in 2013, with the exception of the bolded instance in the lede which was capitalized in this 15 September 2014 edit by Llightex. -- ToE 23:49, 6 March 2015 (UTC)

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.


The sections on Algorithmic Detail and Pseudocode are very similar to Section II in Tian, Xiang, and Khaled Benkrid, Mersenne Twister random number generation on FPGA, CPU and GPU, Adaptive Hardware and Systems, 2009. (Link to paper) I don't think this is an example of fair use. Cryptarch (talk) 04:26, 20 March 2015 (UTC)

For anyone trying to investigate/resolve this complaint, this is the new link to the paper. I will attempt to digest the explanation in the paper and rewrite it. At the very least, I have begun rewriting the pseudocode based on extracting the commonalities in multiple different real implementations. Fizbin7 (talk) 11:39, 19 July 2015 (UTC)
I believe my latest edit fixes these plagiarism issues. Fizbin7 (talk) 21:32, 19 July 2015 (UTC)
Though looking through the history, I see now that this is more likely a case of the paper author pulling from wikipedia than the other way around - the pseudocode used in the paper appeared on wikipedia first. I still think my new pseudocode is an improvement - covering more cases - but there was never a need to rewrite it on plagiarism grounds. (talk) 12:47, 20 July 2015 (UTC)

Can't edit page because of Wikipeda server bogus "old page" edit conflict[edit]

I put my edit at: Mersenne_Twister/Sandbox User:Comp.arch/Mersenne Twister.

I hope whoever is editing concurrently sees my last edit and didn't overwrite that one.. comp.arch (talk) 14:00, 10 June 2016 (UTC)

dash, Apache Commons, minus sign[edit]

According to dash, the preferred use seems to be an em dash. Even if the two uses were equally preferred, there is no reason to change from em dash, unless there is a consensus of editors. Additionally, the title of the web page for Julia has an em dash, and that cannot be changed under any circumstances. As for Apache Commons, MT is part of that; specifying a subpart breaks consistency with the other citations and also might add confusion (e.g. it is unlinked). As for the minus sign, this should be consistent throughout the article. (talk) 18:20, 17 June 2016 (UTC)

Python Sample Code accepts any size seeds, despite using hardcoded 32-bit parameters[edit]

I note that the python 32-bit implementation as given does not limit the seed to a 32bit int despite using hardcoded 32 bit MT values.

In python this allows the seed to accept an int of virtually unlimited size. The remainder appears correct and all subsequent values are trimmed to 32 bits with the masking in twist() and indeed in the remainder of __init__, thus:[i] = _int32(...

Perhaps it is useful. But, I think it is an oversight maybe.

Another minor comment is that the seed function comment says 'Initialize the index to 0' then immediately sets it to a hard coded constant very unlike 0. Updated to correct and explain.

Existing code:

   class MT19937:
       def __init__(self, seed):
           # Initialize the index to 0
           self.index = 624

Suggested replacement:

   class MT19937:
       def __init__(self, seed):
           # Make it clear the seed is going to be a 32-bit int
           seed = _int32(seed)
           # Initialize the index to 'n' as used by the 32-bit values (change it to 312 for 64-bit implementations).
           # This ensures an immediate twist() on the first extract() call.
           self.index = 624

edits: many

Epoch-1 (talk) 07:22, 31 May 2017 (UTC) epoch-1