|WikiProject Statistics||(Rated C-class, High-importance)|
|WikiProject Mathematics||(Rated C-class, High-importance)|
- 1 Unclassified comments
- 2 The arrow
- 3 Context tag
- 4 Which came first
- 5 Backwards
- 6 Likelihood of continuous distributions is a problem
- 7 Area under the curve
- 8 Needs a simpler introduction?
- 9 Median
- 10 graph
- 11 Probability of causes and not probability of effects?
- 12 putting x and theta in bold
- 13 Discussion
- 14 In generalL Likelihoods with respect to a dominating measure
I adjusted the wiktionary entry so it doesn't say that the mathematical definition is 'likelihood = probability'. Someone more mathematical than I may want to check to see if the mathematical definition I gave is correct. I defined "likelihood" in the parameterized-model sense, because that is the only way in which I have ever seen it used (i.e., not in the more abstract Pr(A | B=b) sense currently given in the Wikipedia article). 220.127.116.11 03:06, 21 March 2007 (UTC)
This article needs integrating / refactoring with the other two on the likelihood principle and maximum likelihood method, and a good going-over by someone expert in the field. -- The Anome
I emphatically agree. I've rewritten some related articles and I may get to this one if I ever have time. -- Mike Hardy
All was going well until I hit
In statistics, a likelihood function is a conditional probability function considered as a function of its second argument with its first argument held fixed, thus:
Would it be possible for someone to elaborate on that sentence of to given an example? FarrelIThink 06:12, 21 February 2007 (UTC)
I found the very first sentence under the "Definition" section very confusing:
The likelihood of a set of parameter values, θ, given outcomes x, is equal to the probability of those observed outcomes given those parameter values.
This is not true in the continuous case, as described by the article itself a few sentences later. I think the whole thing would be much clearer if the first sentence were omitted and it simply said "The likelihood function is defined differently for discrete and continuous probability distributions". I'm currently a student of this topic and I had quickly read the first sentence under Definition (and only that sentence), ended up greatly confused, and only later came back to read the rest of the section to clarify things. --nonagonal — Preceding unsigned comment added by Nonagonal (talk • contribs) 19:59, 8 October 2015 (UTC)
I added the context tag because the article starts throwing mathematical functions and jargon around from the very beginning with no explanation of what the letters and symbols mean. Rompe 04:40, 15 July 2006 (UTC)
- The tag proposes making it more accessible to a general audience. A vernacular usage makes likelihood synonymous with probability, but that is not what is meant here. I doubt this topic can be made readily comprehensible to those not familiar at the very least with probability theory. So I question the appropriateness of the "context" tag. The article starts with the words "In statistics,...". That's enough to tell the general reader that it's not about criminology, church decoration, sports tactics, chemistry, fiction writing, etc. If not such preceeding words were there, I'd agree with the "context" tag. Michael Hardy 23:55, 16 July 2006 (UTC)
Which came first
Which came first? the common use as in "in all likelihood this will not occur" or the mathematical function?
- See History of probability. "Probable and probability and their cognates in other modern languages derive from medieval learned Latin probabilis ... . The mathematical sense of the term is from 1718. ... The English adjective likely is of Germanic origin, most likely from Old Norse likligr (Old English had geliclic with the same sense), originally meaning "having the appearance of being strong or able" "having the similar appearance or qualities, with a meaning of "probably" recorded from the late 14th century. Similarly, the derived noun likelihood had a meaning of "similarity, resemblance" but took on a meaning of "probability" from the mid 15th century." Mathematical formalizations of probability came later, starting primarily around roughly 1600. Ronald Fisher is credited with popularizing "likelihood" in its modern sense beginning around 1912, according to the Wikipedia article on him. DavidMCEddy (talk) 15:47, 21 March 2016 (UTC)
An earlier version of this page said "In a sense, likelihood works backwards from probability: given B, we use the conditional probability Pr(A|B) to reason about A, and, given A, we use the likelihood function L(A|B) to reason about B. ". This makes sense; i.e. it says it's backwards, and it is.
The current version uses L(B|A) instead, i.e. it says: "In a sense, likelihood works backwards from probability: given B, we use the conditional probability Pr(A|B) to reason about A, and, given A, we use the likelihood function L(B|A) to reason about B. " This does not make sense. It says it's backwards, but it talks as if Pr and L are interchangeable.
How about switching back to the earlier version, and providing a concrete example to help clarify it? Possible example: Given that a die is fair, we use the probability of getting 10 sixes in a row given that the die is fair to reason about getting 10 sixes in a row; or given that we got 10 sixes in a row, we use the likelihood of getting 10 sixes in a row given that the die is fair to reason about whether the die is fair. (Or should it say "the likelihood that the die is fair given that 10 sixes occur in a row"? What exactly is the definition of "likelihood" used in this sort of verbal context, anyway?) --Coppertwig 20:28, 24 August 2007 (UTC)
I agree. and similarly, in the "abstract", currently the last sentence ends in "...and indicates how likely a parameter value is in light of the observed outcome." I do not know if it is ok to use the word "likely" in this way. Clearly, replacing it with "probable" in this sentence would make it terribly wrong by committing the common reversal-of-conditional-probabilities mistake. Therefore: is "likely" clearly distinct (and understood) from probable? Anyways I would suggest to rewrite and say "... and indicates how likely the observed outcome is to occur for different parameter values." Or am I missing something here? Enlightenmentreloaded (talk) 10:01, 28 October 2011 (UTC)
Likelihood of continuous distributions is a problem
The contribution looks attractive; however, it ignores several basic mathematical facts:
1. Usually likelihood is assessed using not one realization, but a series of observed random variables (independently identically distributed). Then the likelihood expands to a large product. Usually this is transformed by a logarithm to a sum. This transformation is not linear (like that mentioned in the entry), but it attains its maximum at the same point.
2. Likelihood can easily be defined for discrete distributions, where its values are values of some probabilities. A problem arises with an analogue for continuous distributions. Then the probability density function (pdf) is used instead of probability (probability function, pf). This is incorrect unless we use additional assumptions, e.g., continuity of the pdf. Without it, the notion of likelihood does not make sense, although this error occurs in most textbooks. (Do you know any which makes this correct? I did not find any, I did it in my textbook.) In any case, there are two totally different and incomaparable notions of likelihood, one for discerte, the other for continuous distributions. As a consequence, there is no notion of likelihood applicable to mixed distributions. (Nevertheless, the maximum likelihood method can be applied separately to the discrete and continuous parts.)
- Just to clarify, by "the contribution" are you referring to the whole article or a particular section or edit? I assume the former.
- On (1), well, the log-likelihood isn't mentioned in this article but clearly it isn't itself a likelihood. The invariance of maximum likelihood estimates to transformation is surely a matter not for this article but for the one on maximum likelihood. (I haven't checked that article to see what it says on the topic, if anything).
- On (2), I think you've got a point that this article lacks a rigorous definition. I think the more accessible definition is needed too and should be given first. If you want to add a more rigorous definition, go ahead. I'm sure i've seen a measure-theoretic definition somewhere but I'm afraid i've never got to grips with measure theory myself.
- When you say "I did it in my textbook", is that Teorie Pravděpodobnosti Na Kvantových a Fuzzy Logikách? I'm afraid i can't locate a copy to consult. Qwfp (talk) 09:34, 22 February 2008 (UTC)
- The "problem" between definitions of likelihood for discrete and continuous distributions is resolved by using Measure-theoretic probability theory. This generality comes with the substantial cost of learning measure theory. Fortunately, is unnecessary for many applications. It is, nevertheless, useful for many purposes -- one of which is understanding the commonality of the treatment between discrete, absolutely continuous and other distributions. I just added an "In general" section to explain this: A discrete probability mass function is the probability density function for that distribution with respect to the counting measure on the set of all possible discrete outcomes. For absolutely continuous distributions, the standard density function is the density (Radon-Nikodym derivative) with respect to the Lebesgue measure. I hope this adds more clarity than confusion. DavidMCEddy (talk) 16:19, 21 March 2016 (UTC)
Area under the curve
I'm confused about this statement:
- "...the integral of a likelihood function is not in general 1. In this example, the integral of the likelihood density over the interval [0, 1] in pH is 1/3, demonstrating again that the likelihood density function cannot be interpreted as a probability density function for pH."
Because the likelihood function is defined up to a scalar, the fact that the integral is 1/3 isn't that meaningful. However, I think we could say that one possibility is twice as likely as another or similarly that the likelihood of being in the range [a,b] is six times as likely as being in the disjoint range [c,d]. Given that can't be less than 0 or more than 1, it seems sensible to normalize the likelihood so that the integral over that range is 1. I think that we could then say that if
then there's a 50/50 chance of being in the range [a,b] which would correspond to a normalized likelihood of 0.5. Am I mistaken? Why can't we just normalize to 1.0 and then interpret the normalized likelihood function as a probability density function? —Ben FrantzDale (talk) 17:17, 14 August 2008 (UTC)
- "Why can't we just normalize to 1.0"?. There are several reasons. One is that the integral in general doesn't exist (isn't finite). If an appropriate weighting function can be found, then the scaled function becomes something else, with its own interpretation, which would move us away from "likelihood function". However, certain theoretical work has been done which makes use of a different scaling ... scaling by a factor to make the maximum of the scaled likelihood equal to one. Melcombe (talk) 08:45, 15 August 2008 (UTC)
- An example might be the case where an observation X is from a uniform distribution on (0,a) with a>0. The likelihood function is 1/a for a > (observed X) : so not integrable. A simple change of parameterisation to b=1/a gives a likelihood which is integrable. Melcombe (talk) 13:25, 15 August 2008 (UTC)
It doesn't make sense to speak of a "likelihood density function". Likelihoods are not densities. Density functions are not defined pointwise. One can convolve them, but not multipliy them. Likelihoods are defined pointwise. One can multiply them but not convolve them. One can multiply a likelihood by a density and get another density (although not in general a probability density, until one normalizes). Michael Hardy (talk) 16:00, 15 August 2008 (UTC)
- I'm deleting that entire paragraph beginning with, "The likelihood function is not a probability ... ." I agree it's confusing, and I don't see that it adds anything.
- The issues raised by a discussion of "the integral of a likelihood function" could be answered clearly with a sensible discussion of likelihood in Bayesian inference. I don't know if I'll find the time to write such a section myself, but it would make a useful addition to this article. DavidMCEddy (talk) 16:37, 21 March 2016 (UTC)
Needs a simpler introduction?
I believe it is a good habit for mathematical articles on Wikipedia, to start with a simple heuristical explanation of the concept, before diving into details and formalism. In this case I think it should be made clearer that the likelihood is simply the pdf regarded as a function of the parameter rather than of the data.
What is the scaling factor alpha in the introduction good for? If that's for the purpose of simplification of the maximum likelihood method then (a) it is totally misplaced comment and (b) you could put there any strictly increasing function, not just scaling by a constant. --David Pal (talk) 01:35, 1 March 2011 (UTC)
- The Bernoulli trial has a probability distribution function fP defined by fP(0) = 1−P and fP(1) = P. This means that the likelihood function is Lx defined by L0(P) = 1−P and L1(P) = P for 0≤P≤1. For x=0 the maximum likelihood estimate of P is 0; the median is 1−1/√2 = 0.29; and the mean value is 1/3=0.33. For x=1 the maximum likelihood estimate of P is 1; the median is 1/√2 = 0.71; and the mean value is 2/3=0.67. These are point estimates for P. Some likelihood functions have a well defined maximum likelihood value but no median. Other likelihood functions have median but no mean value. See for example the German tank problem#Likelihood function. Bo Jacoby (talk) 22:27, 3 September 2009 (UTC).
The above is wrong.
- First a minor point. The term "probability distribution function usually means cumulative distribution function.
- What sense can it make to call the number proposed above the "median" of the likelihood function? That would be the answer if one treated the function as a probability density function, but that makes sense only if we assume a uniform measure on the line, in effect a prior, so the proposed median is actually the median of the posterior probability distribution, assuming a uniform prior. It's not a median of the likelihood function. If we assumed a different prior, we'd get a different median with the SAME likelihood function. Similar comments apply to the mean. There's no such thing as the mean or the median of a likelihood function. Michael Hardy (talk) 00:02, 4 September 2009 (UTC)
Comment to Michael:
- The article on probability distribution function allows for the interpretation as probability density function.
- The uniform prior likelihood function, f(P)=1 for 0≤P≤1, expresses prior ignorance of the actual value of P. A different prior likelihood function expresses some knowledge of the actual value of P, and no such knowledge is provided. It is correct that assuming a uniform prior distribution makes the likelihood function define a posterior distribution, in which the mode, median, mean value, standard deviation etc, are defined.
Your main objection seems to be that tacitly assuming a uniform prior distribution is unjustified. Consider the (bernoulli) process of sampling from an infinite population as a limiting case of the (hypergeometric) process of sampling from a finite population. The J expression
udaf=.!/&(i.@>:) * !/&(- i.@>:)
computes odds of the hypergeometric distribution.
The program call
1 udaf 10 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
computes the odds when you pick 1 pebble from a population of 10 red and white pebbles. The 11 columns are odds for getting 0 or 1 red pebble, when the number of red pebbels in the population is 0 through 10. The 2 rows are likelihoods for the population containing 0 through 10 red pebbles given that the sample contained 0 or 1 red pebble. The top row shows that 0 red pebbles in the population has the maximum likelihood (= 10). A median is about 2.5 red pebbles = 25% of the population. (10+9+8 = 27 < 27.5 < 28 = 7+6+5+4+3+2+1+0). The mean value is 30% and the standard deviation is 24%.
The prior likelihood function is (of course)
0 udaf 10 1 1 1 1 1 1 1 1 1 1 1
expressing prior ignorance regarding the number of red pebbles in the population. The maximum likelihood value is undefined; the median and the mean are both equal to 50% of the population, and the standard deviation is 32% of the population.
5 udaf 16 4368 3003 2002 1287 792 462 252 126 56 21 6 1 0 0 0 0 0 0 1365 2002 2145 1980 1650 1260 882 560 315 150 55 12 0 0 0 0 0 0 364 858 1320 1650 1800 1764 1568 1260 900 550 264 78 0 0 0 0 0 0 78 264 550 900 1260 1568 1764 1800 1650 1320 858 364 0 0 0 0 0 0 12 55 150 315 560 882 1260 1650 1980 2145 2002 1365 0 0 0 0 0 0 1 6 21 56 126 252 462 792 1287 2002 3003 4368
Study the finite case first, and the infinite case as a limit of the finite case, rather than to begin with the infinite case where a prior distribution is problematic. It is dangerous to assume that lim(f(x))=f(lim(x)). Bo Jacoby (talk) 10:00, 4 September 2009 (UTC).
The likelihood function for estimating the probability of a coin landing heads-up without prior knowledge after observing HHT
How was this graph generated? Is there a closed form for this calculation? Is there a closed form for given # of H and # of T ? —Preceding unsigned comment added by Fulldecent (talk • contribs) 17:46, 13 August 2009 (UTC)
- The expression
- is for fixed n,p a binomial distribution function of i, (i=0,..,n), and for fixed n,i a continuous (unnormalized) beta distribution of p, (0≤p≤1). So the graph is simply
- Bo Jacoby (talk) 12:33, 20 August 2009 (UTC).
Probability of causes and not probability of effects?
The definition given here is the opposite that given by D'Agostini, Bayesian Reasoning in Data Analysis (2003). From pp. 34-35: "The possible values which may be observed are classified in belief by . This function is traditionally called `likelihood' and summarizes all previous knowledge on that kind of measurement..." In other words, it is the probability of an effect given a parameter (cause) . The definition given in this entry, proportional to the probability of a cause given the effect ( seems more useful, as the concept is more important, but is it possible that there is more than one definition in use in the literature? LiamH (talk) 02:10, 4 October 2009 (UTC)
putting x and theta in bold
since P(x|theta) is describing sets of data points (as if a vector), shouldn't it be put in bold?
theta represents a vector (or set) of parameters, and x represents a vector of data points from a sample.
I might be wrong about this, thought it would be worth mentioning
It is confusing to have several different definitions that are approximately the same. We first use P(x|\theta) then p_theta(x) then f_theta(x). Then we have two separate discussions on the page about continuous vs. discrete. Can we just define the likelihood for the discrete case and then refer to Likelihood_function#Likelihood_function_of_a_parameterized_model for the continuous case?
It's noted in several places that the likelihood is defined up to a multiplicative constant, is there a reason we don't define it that way?
Finally, there doesn't seem to be uniform notation on the page can we remedy that?
- On the points you raise, I think that the article needs substantial revisions. Regarding the definition of likelihood for a continuous distribution, the article previously included more on this, but it looked to me to be in error; so I deleted some. See my edit  and especially the explanation's link, which cites Burnham & Anderson.
- Confusion seems to have come about for historical reasons. Originally, likelihood was used to compare different parameters of the same model: there, the constant is irrelevant. Now, likelihood is used to compare different models (see Likelihood function#Relative likelihood of models): here, the constant is relevant.
- SolidPhase (talk) 13:29, 29 January 2015 (UTC)
In generalL Likelihoods with respect to a dominating measure
I wish to thank Podgorec for attempting to clarify this section by inserting, "with all distributions being absolutely continuous with respect to a common measure" before "whether discrete, absolutely continuous, a mixture or something else." I've reworded this addition and placed it in a parenthetical comment at the end of the sentence. I've done this to make that section more accessible to people unfamiliar with measure-theoretic probability -- without eliminating the mathematical rigor.
If this is not adequate, I fear we will need to cite a modern text on measure-theoretic probability theory. My knowledge of this subject dates from the late 1970s and early 1980s. I think my memory of that material is is still adequate for this, but the standard treatment of the subject may have changed -- and I no longer have instant access to a text on the subject to cite now. (It would also be good to mention likelihood in the Wikipedia article on Radon-Nikodym theorem, to help explain one important use, but I won't attempt that right now.) DavidMCEddy (talk)