Talk:Markov random field: Difference between revisions

Content deleted Content added

Inline

Revision as of 22:57, 26 May 2011

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Isn't there error in LaTeX part os section "Pairwise Markov property" ?! Shouldn't there be V insdtead od E ?

Needs much more work - discussion of inference, the Hammersley-Clifford Theorem, links, etc. Will add more. Revise away. -- Soultaco, 22:50 24 Dec 2004 (UTC)

Could someone please elaborate on this sentence in the intro?: "It can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies)." —Preceding unsigned comment added by 18.243.2.30 (talk) 09:59, 2 December 2007 (UTC)[reply]

Potential functions

From the current version of the article:

What's the relationship between the potential function and the Gibbs function?

a set of potential functions $\Phi =\phi _{c_{i}}:A_{c_{i}}\rightarrow R^{+}$ , where each $c_{i}=\lbrace i_{1},i_{2},...i_{|c_{i}|}\rbrace$ represents a clique in $G$ , and $A_{c_{i}}$ is the set of all possible assignments to the random variables represented by $v_{i_{1}},v_{i_{2}},...v_{i_{|c_{i}|}}$ . In other words, each clique has associated with it a function from possible assignments to all nodes to the nonnegative real numbers.

I find this needlessly complex and somewhat confusing. Here is how I would word it:

a set $\Phi$ of potential functions $\phi _{c_{i}}:A_{c_{i}}\rightarrow R^{+}$ , where each $c_{i}$ is a clique in $G$ , and $A_{c_{i}}$ is the set of all possible assignments to the elements of $c_{i}$ . In other words, each clique has associated with it a function from assignments (to each element of the clique) to nonnegative real numbers.

It's not clear to me why the definition needs to mention both i's and v's, or why such schematic letters are even necessary for the definition at all. (Can't we just identify the vertices of the graph with random variables rather than invoking some "representation" relation between them?) However, I'm hesitant to just unilaterally edit the article, as I'm not very knowledgeable about these things. Thoughts? Dbtfz (talk - contribs) 02:21, 4 February 2006 (UTC)[reply]

I went ahead and revised the passage mentioned above as well as much of the rest of the article. Anyone who is interested, please review and revise as needed. Dbtfz (talk - contribs) 06:02, 10 February 2006 (UTC)[reply]

I am not convinced of the basis and need for the requirement of nonnegative real numbers. Is there a reason (ideally, with a reference) for such a restriction, or could the restriction not rather be omitted? See for example the antiferromagnetic Ising model (an Ising model being considered as a special case of a Markov network). --Chris Howard (talk) 16:46, 19:55, 4 May 2008 (UTC)[reply]

Not sure, but I think its a notational problem: if we write

f_{k}=\exp(\phi _{k})

then the f_k are non-negative, and the phi_k are merely real, positive or negative. Some texts seem to call the f_k the "potential functions", while others call the phi_k the potential functions; thus the source of confusion?! linas (talk) 18:21, 28 August 2008 (UTC)[reply]

Thanks, linas, it´s fine now after this change you made. --Chris Howard (talk) 23:55, 30 December 2008 (UTC)[reply]

Equivalence of Markov properties

The three Markov properties are not in general equivalent. Equivalence hold however for distributions, which are positive everywhere, see for instance Lauritzen, `Graphical models' or Koller, `Probabilistic Graphical models'. —Preceding unsigned comment added by Derfugu (talk • contribs) 15:50, 17 November 2010 (UTC)[reply]

MRF vs. Bayesian Network

A Markov network is similar to a Bayesian network in its representation of dependencies, but a Markov network can represent dependencies that a Bayesian network cannot, such as cyclic dependencies.

Shouldn't it be such as cyclic independencies ? Also, this sentence gives the impression that MRFs are a generalization of BNs, when BNs can actually represent independencies that a MRF cannot [1, p. 393 -- chapter available online]

[1] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. -- 139.165.16.117 at 11:28, 17 July 2007

I've fixed the latter problem. Took 15:00, 7 August 2007 (UTC)[reply]

Hold up there; it's true that BN can represent conditional INdependencies that a Bayesian network cannot, but any Bayesian network can be converted into a Markov network representing the same dependencies (as well as new ones). --12.17.136.181 00:40, 11 August 2007 (UTC)[reply]

Log-linear model

Asterion85 (talk) 22:18, 4 May 2008 (UTC)[reply]

In practice, a Markov network is often conveniently expressed as a log-linear model, given by

$P(X=x)={\frac {1}{Z}}\exp(\sum _{k}w_{k}\phi _{k}(x_{\{k\}}))$

it is not correct

absuming that:

$\phi _{k}=\exp(w_{k}*f_{k}(x_{\{k\}}))$

it should be:

$P(X=x)={\frac {1}{Z}}\exp(\sum _{k}w_{k}f_{k}(x_{\{k\}}))$

For references http://www.cs.washington.edu/homes/pedrod/kbmn.pdf

I made a clarifying edit to the page; however, I reversed f and phi, since I think the more common usage is to use phi to denote the field, and use f for the distribution. linas (talk) 18:36, 28 August 2008 (UTC)[reply]

-- There are many undefined variables in this equation.

P(X=x)={\frac {1}{Z}}\exp \left(\sum _{k}w_{k}^{\top }\phi _{k}(x_{\{k\}})\right)

Could someone specify what $'P'$ and $'w_{k}'$ represent in this equation?? JEIhrig 18:46, 12 April 2009 (EST)

Markov network: definition or theorem?

I am confused by this article. The term Markov network is not referenced, can a reference be supplied please? Other terms such as Markov random field are in widespread use. Also, a MRF is surely a probability function that satisfies one of the Markov properties. So then it is a THEOREM not a DEFINITION that it can be factorized in terms of cliques.

I propose to edit the article along the above lines. Objections and clarification invited? Have I got it wrong? Megadeff (talk) 18:46, 29 May 2009 (UTC)[reply]

The factorization definition is usually used as it is stronger than the Markov property definition: i.e. it is possible to construct a graph which satisfies the Markov properties but doesn't factorise (an example on a 4-cycle is given here). Under various conditions (positivity of the density or decomposability of the graph) the definitions are equivalent. —3mta3 (talk) 10:50, 30 May 2009 (UTC)[reply]

As for the name, personally I would call it an undirected graphical model.—3mta3 (talk) 11:13, 30 May 2009 (UTC)[reply]

Thanks. Physicists call things satisfying the clique product formula a 'Gibbs state' or 'Gibbs random field'. Such use has been around a very long time. Maybe in statistics the product formula has taken over, and if that is what they want...fine. But could we keep the word 'Markov' to mean just that? Maybe a better solution is to develop the page on MRF, and let the statistical applications reign here. Megadeff (talk) 08:41, 1 June 2009 (UTC)[reply]

I think the two terms are related enough that it makes sense to keep them on the same article for now, though I would agree to renaming it to MRF, as that seems to be the more commonly used term. As for the definition, we could probably mention them both, my impression is that the differences only arise in pathological cases, so it seems like most people don't really differentiate between the two (well they don't appear to in the Kindermann and Snell book). —3mta3 (talk) 17:34, 1 June 2009 (UTC)[reply]

I've renamed and changed it to include both definitions, and tried to clarify the difference. —3mta3 (talk) 18:59, 2 June 2009 (UTC)[reply]

I am interested in the 4-cycle example you mentioned, where, if I understood you correctly, the 4-cycle is an independency map of the probability distribution (or else: what does it mean that the Markov property is satisfied?) but the distribution cannot be factorized using the 4 clique potentials. Skimming through the paper, the only image of a 4-cycle I found is on page 12. Is that the one you are referring to? -- 131.159.0.7 (talk) 19:30, 28 July 2009 (UTC)[reply]

Yes, that was the example I was referring to, and what I meant by the Markov property being satisfied. I haven't actually read that paper, so can't really comment on anything else in it, but the same example also appears in this paper (Examples 4 and 7) where they explain why it doesn't factorise. —3mta3 (talk) 16:37, 29 July 2009 (UTC)[reply]

Thanks 3mta3. Unfortunately the page has become obscurer in a couple of ways since your edit. The condition Z=1 is irrelevant since it can always be taken into the factor functions. I'll keep away from the statistics bit, but once again I propose the separation of the physics meaning of MRF from the use in statistics. I'm happy to arrange this, but it would be better by consensus. I _think_ the main problem is difference of approach between the two communities. Megadeff (talk) —Preceding undated comment added 12:41, 27 July 2009 (UTC).[reply]

Okay, I've cleaned up the obvious errors. Personally, I wouldn't separate the articles, as they do refer to the same idea (a probability distribution with Markov properties over a graph), unless there is a large amount of physics specific material (in which case I think it would be better to keep the main article, and have a Markov random fields in physics type sub-article). —3mta3 (talk) 16:08, 27 July 2009 (UTC)[reply]

I'll think it over. Megadeff (talk) 18:56, 27 July 2009 (UTC)[reply]

MRFs for the representation of probability distributions

First of all, it should be noted that while the (conditional) independencies represented by an MRF (as given by the Markov properties) are of utmost importance, they alone are essentially meaningless if one does not also describe how one can generally represent probability distributions that are faithful to the (in)dependencies being represented. Most probably, the causality is even reversed here, as it seems more natural to start off with a probability model (such as a product of potential functions) and then think about the independencies that it represents and how to represent them graphically. It is quite strange of you, 3mta3, to state that you "fix[ed] some errors" in the article, when in fact you basically reverted the respective section of the article to a previous state, which, most notably, contained a highly restrictive formulation of the way in which a Markov random field may be used to represent a full-joint probability distribution over a set of random variables. In the general form, the product of clique potentials for a given state of the network need not be normalized, i.e. the resulting product is only required to be proportional to the probability value of the state. Regardless of normalization, the Markov properties are satisfied, as can easily be shown. What is missing in the article is therefore the general formulation of how the full-joint distribution over a set of random variables can be compactly represented by an MRF such that it is in accordance with what the graphical structure asserts. This is highly relevant, because in practice, that is how MRFs are used - at least in artificial intelligence and machine learning, my field of research. (I would in fact argue that the state of the article in April 2009 (before you started editing it) contained more of the relevant formulas than it does now.) Your talk about (directly) factorizing the full-joint is only a special case, which I explicitly addressed in my previous version of this section of the article. So I am curious: What exactly are the errors that you claim to have fixed? -- 131.159.0.7 (talk) 16:50, 28 July 2009 (UTC)[reply]

As Megadeff points out above, the normalization constant can simply be included in one of the φ's, so the definition with the Z is no more general than the definition without.—3mta3 (talk) 17:43, 28 July 2009 (UTC)[reply]

Requiring that the normalization constant is included in one of the factors amounts to an additional assumption, thus a special case. And I do hope you agree that special cases are less general. If no assumptions about clique potentionals are made, the partition function is required (which is why it is, by the way, always present in the literature on MRFs as far as I have come across it - including the paper by Richardson and Domingos that is linked to above). Furthermore, you fail to mention your assumption. Therefore, whether your changes actually "fixed errors" is all the more questionable. You did not answer my question: What are the errors that you claim to have fixed? -- 131.159.0.7 (talk) 18:08, 28 July 2009 (UTC)[reply]

Note that, in particular, inference problems are simplified by assuming Z=1, since an explicit calculation of Z, which would otherwise be necessary and is computationally expensive, can be omitted. (Please also note my question above.) -- 131.159.0.7 (talk) 18:21, 28 July 2009 (UTC)[reply]

Ah, I think I see what you are getting at. My point is that, purely from a theoretical perspective, if we can write the joint density of a single MRF as:

p(a,b,c)={\frac {1}{Z}}\phi _{1}(a,b)\phi _{2}(b,c)

where Z is a constant,

then we can also write:

p(a,b,c)=\phi _{1}^{*}(a,b)\phi _{2}(b,c)

where

\phi _{1}^{*}(a,b)={\frac {1}{Z}}\phi _{1}(a,b)

.

Thus the claim that restricting Z=1 would result in a stronger definition is clearly false (this was the error I claimed to have fixed). Whether or not we know the numeric value of Z is irrelevant, as long as it is finite the definitions are equivalent.

However in most practical circumstances we don't work with a single distribution but instead with a parametrised family of MRFs, in which case Z is no longer a constant, but a partition function over the parameter space (such as in the log-linear model example in the article). Thus knowledge of Z is important as it will influence the likelihood function.

Does that answer your question? —3mta3 (talk) 08:23, 29 July 2009 (UTC)[reply]

While it is true that allowing an arbitrary normalization constant does not in any way change the class of probability distributions that can be modelled, imposing Z=1 does put unnecessary restrictions on the clique potentials and therefore leads to a less general definition of the way in which a distribution can be factorized using MRF semantics. I would suggest that the most general definition should be used in this article - as it is also found all over the literature. I admit that when I rewrote the section, I did not properly read the part in which you describe the limits of factorization in general and thought, because you assumed Z=1 above, you were merely naming (restrictive) special cases where Z=1 (such as the case of the chordal graph in which factors as they would appear in a Bayesian network are used, i.e. marginal and conditional probabilities). What I do not understand is why you did not simply change the sentence in question but instead reverted the entire section, also deleting, for example, the extra information on the range of the potential functions I added.

From a computational perspective, the presence or absence of knowledge about Z is hardly irrelevant. Parametrised families of MRFs are not really relevant to the scope of this article, but if anything, their existence favours the definition that includes Z.

I think one of the key points here is that to you, an MRF is just a representation of (conditional) independencies of a joint probability distribution, whereas to me, this qualitative model is inextricably linked to a quantitative model that describes the distribution itself. Therefore, you are only concerned about describing a sufficient condition for a distribution to have a corresponding MRF (namely that it can be factorized in the way you mention), whereas I am interested in naming the most general way in which MRFs can be used to represent probability distributions.

Regarding the example on the limits of factorization that you mentioned, I would be interested in learning more about this. Please see my question in the section above. -- 85.181.84.137 (talk) 13:56, 29 July 2009 (UTC)[reply]

I still don't understand your claim how restricting Z=1 puts unnecessary restrictions on the clique potentials. Can you give an example? As for deleting the extra information on the range of the factor potentials, that was unintentional (though it is somewhat obvious, since a density must be non-negative). Also, see my reply above. —3mta3 (talk) 16:53, 29 July 2009 (UTC)[reply]

Finite graphs

Shouldn't one say somewhere that the underlying graph has to be finite? When this is not true, many of the stated results do not hold in general (e.g., the equivalence between local and global Markov properties), because of the possibility of phase transitions. Such a remark would be particularly useful, since the first example given is the Ising model, which is very often defined on infinite graphs.--YVelenik 15:36, 7 April 2010 (UTC) —Preceding unsigned comment added by Velenik (talk • contribs)

@@ Line 1: / Line 1: @@
 {{WPStatistics}}
+Isn't there error in LaTeX  part os section "Pairwise Markov property"  ?! Shouldn't there be V insdtead od E ?
 Needs much more work - discussion of inference, the Hammersley-Clifford Theorem, links, etc. Will add more. Revise away. -- [[Soultaco|Soultaco]], 22:50 24 Dec 2004 (UTC)