# Talk:Mixing (mathematics)

WikiProject Mathematics (Rated Start-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Mid Importance
Field: Probability and statistics
WikiProject Statistics (Rated Start-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.
WikiProject Systems (Rated Start-class, Mid-importance)
Start  This article has been rated as Start-Class on the project's quality scale.
Mid  This article has been rated as Mid-importance on the project's importance scale.

## Requested move

This article now covers mixing in a general enough fashion that it should get the general title. linas 13:43, 8 October 2005 (UTC)

• Support, of course. linas 13:43, 8 October 2005 (UTC)

### Discision

Page moved. Ryan Norton T | @ | C 02:12, 15 October 2005 (UTC)

Thanks linas 00:01, 21 October 2005 (UTC)

Where is the intuition in this article? This is an austere technical piece which does not fulfill the Wiki objective of informing the larger public. Come on guys! Tell us why mixing is important and what it applies to!

I agree. There is no hint of a definition before the technical material begins. And what is the utility of pointing out that there are at least three kinds of mixing before a general description of the concept is given? This leads one seeking a definition to seek three definitions. How frustrating. At least say what the three definitions have in common.Dewa (talk) 18:35, 24 April 2008 (UTC)
Can you please be more specific? I don't understand what you don't understand. The definitions seem pretty straightforward and clear to me, this is not supposed to be challenging, its supposed to be introductory material. linas (talk) 18:37, 5 September 2008 (UTC)

## strong mixing

Shouldn't it be strongly mixing?--MarSch 13:01, 20 October 2005 (UTC)

Well, the books and articles usually seem to say "strong" not "strongly". linas 00:01, 21 October 2005 (UTC)
Okay then it is my reader which is non-standard. --MarSch 10:29, 21 October 2005 (UTC)

## technical template

I added the technical template because few (if any) of the formulae are explained. Pdbailey (talk) 23:16, 16 August 2008 (UTC)

Can you be more specific? Which formula are not explained? From where I sit, most of the formula seem pretty basic and self-evident; each one has a wikilink to the article that explains it in greater detail. It's hard to guess which part was unclear to you. (Not that this article is perfect, it has many deficiencies ...) linas (talk) 18:32, 5 September 2008 (UTC)
Starting for the begining (this is just a start). The down arrow is not defined, the right arrow is not defined, the colon-equals is not defined, the alpha equation should be written out in text as well as an equation. Hope this helps! Pdbailey (talk) 23:27, 5 September 2008 (UTC)
The down-arrow is a standard symbol for a limit; this notation is commonly taught in high-school pre-calculus classes. The colon-equals symbol means "definition", again, commonly covered in high-school math. Unfortunately, the rest of the article does require at least college-level classes in statistics, general topology, and measure theory. Without these under your belt, yes, the topic will be overwhelming. However, this lone article cannot provide you with such background, it you don't already have it. linas (talk) 19:12, 2 October 2008 (UTC)
I am just looking at WP:MTAA, and thinking more about this. I'm not trying to argue that this article should be legible to those without need to understand the concept (i.e. if you have no statistics or mathematics background, why would the mathematical definition of mixing interest you?). I think I was overzealous before in suggesting that a high schooler should be able to read it. But I disagree with the idea that this article should only be able to teach people with college classes in measure theory and topology.
Let me make a new set of confusing points to see if they can be clarified.
(1) is there a "in words" description of alpha, or the objective function in alpha?
(2) is there intuition for the last paragraph in the section titled, "Mixing in dynamical systems"?
(3) is there intuition for why mixing is a stronger result than ergodicity?
thanks. Pdbailey (talk) 03:11, 3 October 2008 (UTC)

### Informal description

I was in the library today, and thought I'd poke around on ths topic. I noticed that "most" books on dynamical systems seem to contain definitions of mixing; however, none that I looked at stood out (there were several floor-to-ceiling shelves on the topic, I just picked a few at random). I also note that all of these books seemed to introduce the topic on page 200 or so, which means they cover many other general properties before coming round to mixing.

(1) Say I have a container of fluid. Fix a moment in time. Mentally color the top half of the fluid with red dye, call the red dye 'A'. Since its half, we have P(A) = 1/2. Some time in the future, look at the top half again, this time, call the top half B -- again, P(B)=1/2. Now, at zero time separation, all the red fluid is in the top, so we have $P(A\cap B)=1/2$. But, as time goes on, the red dye A mixes down into the bottom. The overlap of dye A and the top half B is not perfect any more. The intersection of the red dye "A" and the top half "B" is diminishing. After a very long time, half the red dye will be in the top, half the red dye will be in the bottom, and so we have $P(A\cap B)=P(A)P(B)=1/4$. If this mixing is very slow, then, one might expect, after one time step, that, say, $P(A\cap B)=0.49$ since for slow mixing, most of the red dye will still be in "B". So, for slow mixing, alpha(s) will stay large for a long time. For rapid mixing, alpha(s) will drop to a small value quickly -- the intersection of the red dye A with the top half B will quickly go to 1/4. The intersection of sets in the current time vs. sets in the past is just a way of measuring "where stuff came from" -- thus the dye example.

Anyway, the definition of the mixing coefficient should be moved to *after* the definition of mixing, clearly.

(2) This is easy. To say that a mixing system has no non-constant square-integrable eigenfunctions has a very simple analogy in chemistry: If I pour rum into coke, there will initially be pockets of un-mixed rum, but these will eventually get stirred away. These pockets correspond to decaying eigenstates: they are states that don't last, they go away. The only non-decaying eigenstate is one where the rum is evenly distributed throughout the glass: not only is it a non-decaying state, it is also a constant state: the distribution is uniform: there are no pockets of varying density, no pockets of unmixed rum. The mixed state is the steady-state; its unvarying in time, and no other states are steady. Sometimes, there may also be non-square-integrable solutions, but these are always "crazy" in some way, e.g. corresponding to backwards-time (decreasing entropy), or being "non-physical" in other ways.

This physical intuition for decaying and non-decaying eigenstates of dynamical systems is usually established before topics like "mixing" are introduced; its sort-of a pre-requisite concept.

(3) *Every* mixing system is ergodic; "most" ergodic systems are not mixing. This is provable as a theorem: mixing implies ergodicity. The use of the word "stronger" is just a common mathematical way of saying "less wiggle room" or "more constraining" or "more narrow". To say "I'm really really sure" is "stronger" than "maybe I dunno". If you are studying a new unknown system, and merely know that its ergodic, well, you don't know much. If you know that its mixing, you certainly know more: you know that it is ergodic, and that it also obeys this much "stronger" constraint on its behaviour -- it strongly narrows down what the system can do. Compare and contrast to strong convergence and weak convergence in Hilbert spaces -- the resemblance is not entirely accidental.

linas (talk) 05:49, 3 October 2008 (UTC)

I would like to add these as examples to the article, but I have a few questions before I understand them. In (1) What is the probability measure? And what is the meaning of $A\cap B$? For (2), what you describe appears to imply that T has only eigenfunctions with associated eigenvalues less than unity, not that they do not exist. Also, why are eigenfunctions with singularities allowed? For three, I understand what stronger means, I was looking for intuition. See what you think of this: ergodicity implies that every state can be visited, mixing implies that they are in finite time. Pdbailey (talk) 23:26, 3 October 2008 (UTC)
Can anyone answer these questions? Do they make sense? Pdbailey (talk) 18:42, 10 October 2008 (UTC)
Sorry for late response. (1) all stochastic systems have a probability measure, that is what the word "stochastic" means. Most mathematical descriptions of physical systems have a probability measure associated with them, either implicitly or explicitly. Informally, the measure corresponds to "where the system spends most of its time". So, for example: for planetary motion, the measure is high on the planet's orbit, and more or less zero everywhere else. For quantum mechanics, its the square of the wave function. For chemistry, its the Boltzmann distribution.
Formally, the mathematical definition of a measure (mathematics) is just an abstraction of all of these concepts. It is just a way of defining integrals in a general setting, even when functions might not be differentiable (so, for example, measures resolve the formal difficulties of the Dirac delta function in an almost trivial way.) (Historcialy, the path went through the Riemann-Stieltjes integral to the Lebesgue measure and Lebesgue integration, ending in the definition of the Borel algebra). When the measure of the total space is one, then a measure is exactly the same thing as a probability; in this sense, measure theory and probability theory are more or less then same thing; differing only in focus. Books that present one invariably review the other.
Anyway, a firm grasp of measure theory and/or probability theory is a pre-requisite for understanding this article.
The notation $A\cap B$ is set intersection.
Here, T is the time evolution operator, it *always* has an eigenstate with eigenvalue 1; this is the ground state or steady state. The existence of the maximal eigenvalue is gauranteed by the Frobenius-Perron theorem. T is a bounded operator, and is thus a continuous function on the Banach space on which it acts. T can also be understood as the pushforward of the time evolution of the system; it is sometimes called the Ruelle-Frobenius-Perron operator or transfer operator (by analogy to the transfer interaction in the Ising model). Students of dynamical systems will typically have at least some background in each of these topics; again, these are sort-of pre-requisites for this article.
If your background is limited to quantum mechanics, then you can kind-of think of T as the unitary time evolution operator U. The eigenvalues of U always have unit norm. There are many similarities between T and U (and many differences).
David Ruelle has done much to help understand, in well-founded, mathematical terms, of the topics commonly discussed in statistical mechanics; this formalization has lead to new understanding of old systems: e.g. that the QM wave functions of gas particles in a box are fractal and space filling, and that they are space-filling *even if you initially confine* them to one side of the box -- they are destructively interfering, until that time when you remove the confining barrier. Really cute stuff, a nice picture for old, boring undergrad textbook stuff.
You ask: why are eigenfunctions with singularities allowed? Why wouldn't they be? You might try to argue that they are not 'physical', but take care: light hitting a piece of glass does something 'singular' at the boundary; any first-order phase transition e.g. melting, is 'singular' at the transition. So singularities are common in physics.
See what you think of this: ergodicity implies that every state can be visited, mixing implies that they are in finite time. Interesting, but no; First, even in 1890, Poincare knew that ergodic systems visit their states in finite time; this is the Poincare recurrence theorem. Also, be careful: a single state typically has measure zero, so you have to be careful in defining what you mean by 'state'. So, rather, roughly speaking, ergodicity implies that every state is visited in finite time (the Poincare recurrence time for that state) (Although these, while finite, are presumably unbounded). Mixing implies that all states get arbitrarily close to each other, (and so, get tangled together in a such way that essentially, they can't get untangled .. you cannot draw a line and say everything on one side is this, and everything on the other side is that, because that line is not only a hopelessly contorted fractal, but its space-filling in a nasty way.)
So, informally, one might say: ergodicity is about what happens in time, mixing is about what happens in space; all mixing systems are ergodic, but there are ergodic systems that don't mix.
As to visiting points in a phase space: also of interest is the wandering set, this is the mathematical model of systems that are away from equilibrium, and have a potion of phase space that 'wanders away', and is never visited again. i.e. these correspond to dissipative systems. linas (talk) 18:38, 20 October 2008 (UTC)

I'm somewhat stumped by your questions. Lets try (1) first. The example was that of a jar of fluid. Let us say that the volume of the fluid is exactly one liter. The probability of finding an atom of fluid somewhere inside that jar is exactly one. Consider that half the fluid has been colored red (and its not yet been mixed). That means that 0.5 liters is red. If one was to sample "randomly", say, with a pipette, some of the fluid, one has a 50-50 chance of finding red fluid in the pipette. The probability of finding red is 0.5. In fact, since probabilities are measures, and, in this case, the measure is a 3D volume measure, normalized to a single liter, the probability is identical to the volume. Thus the probability of one milliliter is 0.001, since one milli-liter occupies exactly 0.001 of a liter. So, for example, let us say that "Alice" replaced one milliliter of fluid with blue dye (and did not mix). Then, if Bob tried to sample fluid, he would have a chance of exactly 0.001 of finding the blue dye. Does that make sense? The probability measure, in this example, is the same as the volume measure. This is not an accident; the idea that a "volume" is a "measure" is a basic concept in probability theory.

I'm stumped, because you seem to talk as if you know what eigenvectors are, but I guess you are not familiar with probability and statistics?

I am not sure what you are fishing for in (3). You do understand what set intersection is, right? Is the problem that you don't understand what set intersection has to do with fluid? This is again, just a probability measure. Say that Alice mixes exactly 0.5 liters of alcohol with exactly 0.5 liters of water. Lets call the alcohol "set A" and the water "set C". Notice that set A intersect set C is always empty, even after mixing: alcohol and water do not transmute into one another, the remain distinct sets, even though mixed together. Now, if Bob samples a single atom out of Alice's mixture, he has a 50-50 chance of finding that atom to be water, and a 50-50 chance of finding it to be alcohol. Right?

Now, lets couple the notion of sets and set intersection with the notion of volume. Let Set B be exactly one milliliter of Alice's mixture. What is Set B intersect set A? Well, clearly, half of B will be water, and half of B will be alcohol. Thus, the intersection of Set B with set A, or with set C, is non-empty. The volume of $B\cap A$ is exactly one-half milliliter. So we write $P(B\cap A)=(1/2)*(1 milliliter)/(1 liter) = 0.0005$, right? And likewise, $P(B\cap C)= 0.0005$.

So lets review these basic results from probability theory. The probability of the whole volume is 1, we write $P(\Omega)=1$, where Omega denotes the whole entire volume (one liter). We write $P(A)=P(C)=0.5$ since the whole volume is composed of half water and half alcohol. We have $A\cap C=\varnothing$, since alcohol is not water. We have that the volume of the empty set is zero: $P(A\cap C)=0$. We also write $\Omega=A\cup C$ so that $P(A\cup C)=1$; here "cup" is set union. Finally, we saw how to use set intersection in the context of probability theory: we have $P(B\cap A)= P(B\cap C)=0.0005$.

(In the original example, I have taken Set B to be half-a-liter, but perhaps it is easier to think of Set B as being small, being a small sample of the mixture).

The finite graph that you refer to is called a Markov chain, it is also a topic that is commonly studied in classes on probability theory. There are many subtleties in studying Markov chains, which the WP article does not cover; its a fairly big topic. Some of the subtler points are, however, discussed in the articles on Bernoulli schemes and shift of finite type. However, since you do not have a background in probability, I fear that the last two articles will also feel like they're from outer space.

Do go study Markov chains; your efforts will be rewarded; they are used very very widely in many, many different branches of science, from bio and genetics, to radio and electronics engineering, satellite communications, data mining.linguistics, AI.

And yes, I will try to re-work this article someday. The mixing-of-dyes example above does provide the basic intuitive notion. Perhaps it would be better to think of mixing red sand and blue sand together: the individual grains of sand retain their color, and do not mix. Unfortunately, I cannot think of any quick, intuitive way to bridge from that to notions of eigenvalues; this requires another sort of quantum-leap of understanding, as it were. linas (talk) 04:26, 24 October 2008 (UTC)

Okay, I am surprised by how unclear I can be with my questions. However, you did answer (1&3). The article reads that the objective function in the sup of alpha is $|P(A \cap B) - P(A)P(B)|$. In your example, I can not work out what the events A and B are. Here is my problem. Lets say that A is the event that a particular molecule (water or ethanol) is in the one mL sample, and that B is the event that, a particular molecule is ethanol. Is that right? If so, when you start $P(A)=0.001$ $P(B)=0.5$ and $P(A \cap B)= 0.001$. So the alpha function starts positive. Then at time infinity $P(A)=0.001$ $P(B)=0.5$ and $P(A \cap B)= 0.0005$ and alpha is 0. This is the answer to one and three, and I can now include it as an example of the definition.
For two (2), you gave nothing approaching an answer, you just gave me a bunch of information I already knew. If you want, I could give a more concrete example. say an MP3 player shuffles songs randomly, and can have increased probability or decreased probability of selection the next song being by the same artist. There are two methods to implement this (both are represented as MCs because the Markovity of the requested system). In both, all transitions have positive probability. In the first the transition between a song that is by the same artist has probability $1/n + a$ where a is an adjustable factor and n is the number of songs in the playlist. Then the transitions to songs by other artists would be uniform at $1/n + b$ where b is a fixed function of a and n. Notice however that for every non-zero value of a, the eigenfunction that has unit eigenvalue does not have uniform probability. Instead (when a > 0) artists with fewer songs are simply less likely to have their songs played. There are also many decaying eigenfunctions (those with eigenvalue less than one) that mix only within the artist. Instead of using this simple method, one could also find the transition probability matrix by choosing the eigenfunctions and eigenvalues. In this case, you might want the uniform eigenvalue to have unit root, and eigenfunctions that are within artist to all have eigenvalue -1 < c < 1. Then by adjusting c, the persistence within an artist can be adjusted without adjusting the long run play probability. Do these systems meet the mixing requirement? How does this square with the claim that, "for the system to be (at least) weak mixing, none of the eigenfunctions can be square integrable." PDBailey (talk) 15:39, 24 October 2008 (UTC)
This conversation is degenerating, I have a hard time understanding your level of knowledge, and so am having a hard time asnwering your questions. You seem to understand some advanced things, but are missing the more basic things that the advanced things are built upon. For example, I actually over-simplefied my discussion of volumes and probability measures. The correct probablisitic setting for any random variable is the product topology, aka the exponential object. Thus, for a coin flip, it is the actually the infinite sequence of all possible outcomes of a coinflip, it has the structure of a binary tree -- each coinflip tacks you on the left or right branch of the binary tree -- the structure of this topology can be envisioned as the set of all infinite subtrees of a binary tree. The measure is given by the size of the subtree, relative to the size of the whole tree. For the case of mixing fluids, it is the infinite product of all possible arrangements of the fluid at time 1, time 2 time 3., etc. and technically speaking, the probability measures are defined on this product space, the product of all possible outcomes of an infinite number of measurements, extending forwards and backwards in time. If you have studied quantum mechanics, this can be recognized as the many-worlds interpretation of QM; it is simply a statement that the space of probabilities needs to be understood as a product topology, which can be thought of as infinitely branching.
I didn't bother answering question 2 because I didn't believe that you have the basic mathematical background to understand any answer that I would give you. It appears that you know more than you let on, so let me ponder for a moment and see how to answer it. However, if you think you are laying some sort of trap, by playing dumb when you are not, please don't play that game. linas (talk) 16:10, 24 October 2008 (UTC)
Linas, this answer to what the probability measure is makes much more sense to me than the above, thanks! Now, why don't you lay your answer to (2) on me. PDBailey (talk) 16:21, 24 October 2008 (UTC)

The answer is that the part of the article talking about eigenfunctions is poorly worded, and should be properly re-written using a proper definition of the operator, and a proper statement of the theorem. For the case of Markov chains, the goal was not talk about the eigenfunctions of the transition matrix of the Markov chain, but rather about the pushforward of the transition matrix, which is an infinite dimensional operator. Let X be a finite set, the set of all of the nodes of the Markov chain. We then consider the space $\Omega = X^\omega=X\times X\times\cdots$ which is an infinite product -- this is the proper "probability space" on which the probabilities are defined. We then consider the Banach space $\mathcal{F}(\Omega)=\{f:\Omega\to\mathbb{R}\}$, the set of all real-valued functions on Omega. I made a slight abuse of notation here, the correct way to think of a function $f\in \mathcal{F}(\Omega)$ is as a function that takes any given Borel set from Omega, and assigns a real number to it. Let $\mathcal{B}(\Omega)$ be the Borel algebra on Omega, then given $A\in \mathcal{B}(\Omega)$, i.e. $A\subset \Omega$ the function f maps A to f(A).

Now, as to the pushforward: If T is the transition matrix from the Markov chain, consider now the corresponding operator $T^{-1}: \mathcal{B}(\Omega) \to \mathcal{B}(\Omega)$. This is the shift operator. We have to be careful defining the shift operator; it gets subtle here; in most cases we are interested in the case where it is measure-preserving, in that for any $A\in \mathcal{B}(\Omega)$, we have $\mu(A)=\mu (T^{-1}(A))$. Here. mu is the measure; for the Markov chain, its the Markov measure. The transition matrix induces a linear operator, lets call it $\mathcal{L}$ on the space $\mathcal{F}(\Omega)$. That is, $\mathcal{L}: \mathcal{F}(\Omega)\to \mathcal{F}(\Omega)$. It is given by

$[\mathcal{L} f](A) = (f\circ T^{-1})(A)$

where $A\in \mathcal{B}(\Omega)$ and $f\in \mathcal{F}(\Omega)$. It is straight-forward to verify that $\mathcal{L}$ is linear. Less obvious is the fact that it is a bounded operator. That it is, is essentially the Ruelle-Frobenius-Perron theorem. In fact, its largest eigenvalue is 1. And here's the circular bit: the eigenvector corresponding to this eigenvalue is exactly the Markov measure; the shift-invariance of the Markov measure is what made it be so. There are various cases where this measure is and is not unique, depending on how things can be factored or not.

The statement about square-integrability is essentially that, for mixing systems, there are no other eigenvectors. However, even as I am writing this, I am now getting confused by a few points; clearly Markov chains have decaying eigenvectors, so I must have forgotten some key element. It sure would be nice to have a book or paper in front of me, that stated that theorem clearly; but I do not have such a book or paper. Lets see now, Markov chains are mixing, right? I think so, they seem to obey the definitions given in this article, right? So I am not sure of what the matter is. Lets see, the decaying eigenfunctions are measurable; I can integrate with respect to the measure, right? I think so. To be "square-integrable", I think the proper definition here is that the belong to $L^2$, the Lp space, and particularly a Hilbert space. Thus perhaps the theorem is that the decaying eigenfunctions, although they belong to a Banach space (they're measurable), they do not belong to a Hilbert space. i.e. for a Banach space to be a Hilbert space, it has to be endowed with an inner product, the square-root of which is compatible with the norm on the Banach space (which is given by the Markov measure, in this particular case). But, as I say, I am now confused over this and several other matters; I will ponder; but this will take a while, maybe a long while, as I have other things I should be doing, and some of the things confusing me are also non-trivial. linas (talk) 18:11, 24 October 2

I had not heard of mixing until I saw this article, nor is it in any of the computational statistics / applied math books I own. The exception is that statisticians often require, "good mixing" in their MCMC simulations, I assumed the concepts were related. The paragraph in question now reads

For a system that is weak mixing, the shift operator T will have no (non-constant) square-integrable eigenfunctions. In general, a shift operator will have a continuous spectrum, and thus will always have eigenfunctions that are generalized functions. However, for the system to be (at least) weak mixing, none of the eigenfunctions can be square integrable.

Based on your example, I am guessing that something to the effect of "with/associated with eigenvalue one" needs to be added to the first and last sentence, and the last sentence also needs, "non-constant eigenfunctions" in place of "eigenfunctions." An example that would fail then is a box with xenon and helium in it would not be mixing because the xenon would fractionate to the bottom relative to the helium. The eigenfunction favors the bottom for the xenon (is non-constant), is square-integrable, and is "stable" or has eigenvalue one. PDBailey (talk) 19:03, 24 October 2008 (UTC)
Careful, don't get tripped up by real-world analogs. The Markov measure should be understood as being "constant" or "uniform", in the sense of a Haar measure -- or perhaps "invariant" would be a better term than "constant". The xenon-helium example is a bad example. Oil-n-water clearly fractionates, so that would do as an example of not mixing. But, for the xenon-helium case, some of the helium would be down at the bottom, and some of the xenon would be up at the top; I think that if you properly computed the measure, you would observe that it is a function of gravity, of temperature, and of pressure (and of the percentage of neon-helium, i.e. the grand canonical ensemble). You would have a distribution of the xenon that would be a Haar measure, i.e. that was invariant, once you correctly adjusted for gravity (since the gravity adds a non-uniformity to the volume.) To be precise, I believe that the correct Haar measure is the Boltzmann distribution for this case. The volume you need to think about is not a volume of space, but rather, volumes in the phase space. The measure on these volumes *is* uniform, ("constant", by abuse of language) -- its the Boltzmann distribution. To be clear the measure is
$Pr(A)= \frac{1}{Z(\Omega)} e^{-\beta H(A)}$
where Pr(A) is the probability of a volume of phase space A, while Z(Omega) is the partition function, and the energy H contains a term -mgh, for m the mass of a helium (or xenon) atom, g is the gravitational constant, and h is the height of the ... err...ummm.... Lets take A to be really really small. Then we can take h to be the height of the projection of the phase space element A into real, 3D coordinate space. Right? The point is that Pr(A) is invariant (constant) as I move around in phase space, although its size, as I project it down into the box containing the gasses, will vary in size (according to the height, *and* according to which of the two gasses we are talking about). linas (talk) 19:39, 24 October 2008 (UTC)
I mean, this is in no way a mathematical proof that gasses, even in the absence of gravity, are mixing; as far as I know, such mathematical proofs do not exist, and are not likely to be found anytime soon -- there are almost no rigorous mathematical proofs for mixing for almost anything at all that would be considered to be vaguely "physical" -- only a few toy models are exactly solvable. But *if* you *assume* that you could prove that some given physical system is mixing, *then* you can make statements, like those above, as to what that system would do. It all fit together into a reasonable framework, even though the tools to accomplish much are more or less absent. linas (talk) 19:56, 24 October 2008 (UTC)

As for the other bits ... Hmm, well, of course, that's it, isn't it? For $\mathcal{F}$ to be a Hilbert space, it has to be self-dual, that is, the norm dual of $\mathcal{F}$ has to be isomorphic to $\mathcal{F}$ itself. But it is "well known" that this is not the case. Clearly, this is not the case for measure spaces in general: the Dirac delta function is measurable (we can integrate just fine over it), but it is not square integrable (there is no way to define its "square", and integrate it). In particular, the Dirac delta function is a valid element of $\mathcal{F}$.

The duals to the decaying eigenfunctions are "well-known" to be (derivatives of) delta functions (they act as if they were taking derivatives). I have some references that explain this, but none that do so clearly, certainly not in terms measure theory. At any rate, the duals are not ordinary functions, and thus, the space, as a whole, cannot be given a basis that would turn it into a Hilbert space. I think that, with some effort, I could prove that this is the case for all Markov chains. Perhaps I should, this is a topic of some interest to me. How to prove a more general theorem for mixing, that I do not currently know, perhaps it would become obvious from proving the special case.

If you want to do this yourself: pick the simplest possible Markov chain you can, say the coin-flip. Find the eigenvectors (these will be the Bernoulli polynomials). Find their duals. You'll see that the duals are delta functions. You'll not be able to find duals that are square integrable, you won't be able to find a basis for the function space that is square-integrable. linas (talk) 19:14, 24 October 2008 (UTC)

Heh. Some of this is coming back to me. Of course, the duals to the Bernoulli polynomials is given by the Euler-MacLaurin formula. Using quantum-mechanical bra-ket notation, where $B_n(x)= \langle n|x\rangle$ is a Bernoulli polynomial, the dual is $B^*_n(y)=\langle y|n \rangle$, and the sum $\delta(x-y) = \sum_{n=0}^\infty \langle y|n \rangle \langle n|x \rangle$ is just the Euler-MacLaurin formula in disguise. If you work out what $\langle y|n \rangle$ is, you find something like $B^*_n(y)=\langle y|n \rangle \simeq \frac{1}{n!}\frac{d^n}{dy^n} (\delta(y)-\delta(1-y))$ (up to an overall factor, I forget the details). This formula is generalized in many ways, there's an entire branch of math devoted to it, with names like Hardy, Sheffr, Apell, Hildebrandt, Leffler, Mittag associated with it. Almost none of the adjoint spaces ever form a Hilbert space; the grand exceptions are of course those given by the orthogonal polynomials, which are Hilbert spaces. See for example, Sheffer sequence, as an article that explicitly explores the duals, and discusses how they're not in general square-integrable. If you look carefully, you'll see that the shift operator, the T from this article, is the just the ordinary derivative d/dx for the Bernoulli's, and is a generalized derivative for the Sheffer sequences (err, no that's wrong). So you almost get the complete worked example there; if I'm not wrong, I think the Markov chain transition matrix simply gets re-written into one of these generalized derivatives (that's wrong, you've got to do it p-adically, that's the whole point of the subshift of finite type Sheesh!), and you get the Sheffer sequence popping out at the end. I really don't know the details, I have only the vaguest memory of heaving read this once, and my memory plays tricks on me. My memory is like a sieve; if only I could remember even one-hundredth of what I've ever studied ... linas (talk) 20:27, 24 October 2008 (UTC)

### One more try

Okay, so I have thought about this and I think a plainer language description of $\alpha$ would be

for any two possible realizations of the random variable, if give an sufficient amount of time between them, the events are independent.

Does that sound right? If so, I would like to add it. PDBailey (talk) 03:24, 23 January 2009 (UTC)

## Clarity and quality of writing

I corrected the errors I found for which I trusted my knowledge of the subject enough to change, but the suitable corrections I considered in multiple sentences that were unclear or that contained errors were too ambiguous with my limited knowledge on the topic; I don’t know how to clarify some sections without (incorrectly) changing the meaning. Therefore I added the clarify and expert templates—the article needs persons or a person with sufficient subject knowledge and clarity of writing. ―Dmyersturnbull (talk) 20:49, 1 October 2008 (UTC)

It would be best if you would state here what aspects of this article you find confusing. I did remove one statement you objected to; I could not understand that sentence myself. However, as part of your corrections, you removed some statements about topological mixing; I restored these, as the removal seems unjustified.
Last I looked at this article, it did not contain errors, nor did it seem ambiguous to me, but I can be blind to this. Please pose questions. linas (talk) 18:55, 2 October 2008 (UTC)

"In all cases, ergodicity is implied: that is, every system that is mixing is also ergodic (and so one says that mixing is a "stronger" notion than ergodicity)." Can I ask you about this? Do you mean that the notion of mixing is stronger than that of ergodicity because it conatins that notion as well as other concepts or descriptions? Is that what this sentence means? Also, if every system that is mixing is also ergodic, is every ergodic system mixing? If not, then, would ergodic systems not have other aspects, or even instances, not covered by mixing? If so, then this would make the term "stronger" more tenuous, as a comparison of the two notions' superfluities (if I may put it thus) to eachother would have to be carried out. (If A then B does not allows us to derive A from B) Certainly, in that case, the "so one says that mixing is a stronger notion" would be inadequate, as more grounds for such an assertion would be needed. If, rather, ergodicity implies mixing, as well as mixing implying ergodicity (and more) should this not be stated in the sentence? That is, is it the case of: if A then B or of A IFF B (where mixing covers more also.) There seems to me to be a categorical logical difference between the two statements and it is not clear which you mean. I know nothing about the area myself, which is why I am asking, but that is also the reason why I am reading the article in the first place. I hope my question is not too tedious, and my sincere apologies if it is. SF —Preceding unsigned comment added by 79.97.217.134 (talk) 14:19, 4 December 2009 (UTC)

## Is this example mixing? Ergodic?

Evolution of an ensemble of classical systems in phase space (top). The systems are a massive particle in a one-dimensional potential well (red curve, lower figure). The initially compact ensemble becomes swirled up over time.

I have been editing articles on statistical mechanics lately, and I made this figure to demonstrate mixing in the Gibbs sense (see Elementary principles in statistical mechanics (1902) chapter XII, especially page 144 onward). I would like to suggest it for this article but I'm not sure how to describe the process in the article's terminology. From the Gibbs' terminology, this system is apparently "mixing" due to possessing a wide variety of recurrence periods, however it is not ergodic because it does not equally explore all states of the same energy—it never visits the left well. Since the article says that mixing implies ergodicity, I'm curious to hear what the classification of this case would be in the mathematical terminology? --Nanite (talk) 09:54, 27 October 2013 (UTC)