Frequentist inference: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Added in some sections, added in a rigorous definitional section, detailed the philosophy of statistics and its impact on inference, elaborated on integral understandings. I believe more on asymptotic theory, an examples section, and a predictive inference section are needed, and may write later
Line 6: Line 6:
{{Statistics topics sidebar}}
{{Statistics topics sidebar}}


'''Frequentist inference''' is a type of [[statistical inference]] based in [[frequentist probability]], which treats “probability” as equivalent with “frequency” and draws conclusions from sample data by emphasising the frequency or proportion of findings in the data. The frequentist inference underlies '''frequentist statistics''', in which the well-established methodologies of [[statistical hypothesis testing]] and [[confidence interval]]s are based.
'''Frequentist inference''' is a type of [[statistical inference]] based in [[frequentist probability]], which treats “probability” as equivalent with “frequency” and draws conclusions from sample data by emphasizing the frequency or proportion of findings in the data. The frequentist inference underlies '''frequentist statistics''', in which the well-established methodologies of [[statistical hypothesis testing]] and [[confidence interval]]s are based.

== History of Frequentist Statistics ==
The history of frequentist statistics is more recent than it's prevailing philosophical rival, Bayesian statistics. Frequentist statistics were largely developed in the early 20th century and have, by modern times, evolved to become the dominant paradigm in inferential statistics, while Bayesian statistics were invented in the 19th century. Despite this dominance, there is no agreement as to whether frequentism is better than Bayesian statistics, with a vocal minority of professionals studying statistical inference decrying frequentist inference as internally inconsistent. For the purposes of this article, frequentist methodology will be discussed as summarily as possible but it is worth noting that this subject remains controversial even into the modern day.

The primary formulation of frequentism from the philosophical view that statistics can be understood in a probabilistic frequency. This view was primarily developed by [[Ronald Fisher]] and the team of [[Jerzy Neyman]] and [[Egon Pearson]]. Ronald Fisher's contribution to frequentist statistics is that he developed the frequentist concept of "significance testing", which is the study of the significance of a measure of a statistic when compared to the hypothesis. Neyman-Pearson extended Fisher's ideas to multiple hypotheses by conjecturing that the ratio of probabilities of hypotheses where maximizing the difference between the two hypotheses leads to a maximization of exceeding a given p-value, and also provides the basis of '''type I''' and '''type II''' errors. For more, see the [[foundations of statistics]] page.

== Definition ==
For statistical inference, the relevant statistic about which we want to make inferences is <math>y \in Y</math>, where the random vector <math>Y</math> is a function of an unknown parameter, <math>\theta</math>. The parameter <math>\theta</math> is further partitioned into (<math>\psi, \lambda</math>), where <math>\psi</math> is the '''parameter of interest''', and <math>\lambda</math> is the '''nuisance parameter'''. For concreteness, <math>\psi</math> in one area might be the population mean, <math>\mu</math>, and the nuisance parameter <math>\lambda</math> would then be the standard deviation of the population mean, <math>\sigma</math>.<ref>{{Cite book|last=Cox|first=D. R.|url=https://www.amazon.com/Principles-Statistical-Inference-D-Cox/dp/0521685672|title=Principles of Statistical Inference|date=2006-08-01|pages=1-2}}</ref>

Thus, statistical inference is concerned with the expectation of random vector <math>Y</math>, namely <math>E(Y)=E(Y;\theta)=\int y f_Y (y;\theta)dy </math>.

To construct areas of uncertainty in frequentist inference, a '''[[Pivotal quantity|pivot]]''' is used which defines the area around <math>\psi</math> that can be used to provide an interval to estimate uncertainty. The pivot is a probability such that for a pivot, <math>p</math>, which is a function, that <math>p(t,\psi)</math> is strictly increasing in <math>\psi</math>, where <math>t \in T</math> is a random vector. This allows that, for some 0 < <math>c</math> < 1, we can define <math>P\{p(T,\psi)\leq p^*_c\}</math>, which is the probability that the pivot function is less than some well-defined value. This implies <math>P\{\psi \leq q(T,c)\} = 1-c</math>, where <math>q(t,c)</math> is a <math>1-c</math> '''upper limit''' for <math>\psi</math>. Note that <math>1-c</math> is a range of outcomes that define a one-sided limit for <math>\psi</math>, and that <math>1-2c</math> is a two-sided limit for <math>\psi</math>, when we want to estimate a range of outcomes where <math>\psi</math> may occur. This rigorously defines the '''confidence interval''', which is the range of outcomes about which we can make statistical inferences.

== Fisherian Reduction and Neyman-Pearson operational criteria ==
Two complementary concepts in frequentist inference are the Fisherian reduction and the Neyman-Pearson operational criteria. The Fisherian reduction is a method of determining the interval within which the true value of <math>\psi</math> may lie, while the Neyman-Pearson operational criteria is a decision rule about making ''a priori'' probability assumptions.

The Fisherian reduction is defined as follows:

* Determine the likelihood function (this is usually just gathering the data);
* Reduce to a [[sufficient statistic]] <math>S</math> of the same dimension as <math>\theta</math>;
* Find the function of <math>S</math> that has a distribution depending only on <math>\psi</math>;
* Invert that distribution (this yields a cumulative distribution function or CDF) to obtain limits for <math>\psi</math> at an arbitrary set of probability levels ;
* Use the conditional distribution of the data given <math>S = s</math> informally or formally as to asses the adequacy of the formulation.<ref>{{Cite book|last=Cox|first=D. R.|url=https://www.amazon.com/Principles-Statistical-Inference-D-Cox/dp/0521685672|title=Principles of Statistical Inference|date=2006-08-01|pages=24, 47}}</ref>

Essentially, the Fisherian reduction is design to find where the sufficient statistic can be used to determine the range of outcomes where <math>\psi</math> may occur on a probability distribution that defines all the potential values of <math>\psi</math>. This is necessary to formulating confidence intervals, where we can find a range of outcomes over which <math>\psi</math> is likely to occur in the long-run.

The Neyman-Pearon operational criteria is an even more specific understanding of the range of outcomes where the relevant statistic, <math>\psi</math>, can be said to occur in the long run. The Neyman-Pearson operational criteria defines the likelihood of that range actually being adequate or of the range being inadequate. The Neyman-Pearson criteria defines the range of the probability distribution that, if <math>\psi</math> exists in this range, is still below the true population statistic. For example, if the distribution from the Fisherian reduction exceeds a threshold that we consider to be ''a priori'' implausible, then the Neyman-Pearson reduction's evaluation of that distribution can be used to infer where looking purely at the Fisherian reduction's distributions can give us inaccurate results. Thus, the Neyman-Pearson reduction is used to find the probability of [[Type I and type II errors|'''type I''' and '''type II''' errors]].<ref>{{Cite web|title=OpenStax CNX|url=https://cnx.org/contents/aOvnYzjq@1.9:7yMVBb6e@2/The-Neyman-Pearson-Criterion#:~:text=The%20Neyman-Pearson%20criterion%20is,with%20a%20particular%20hypothesis%20test.&text=The%20probabilities%20P0(declare,1%E2%88%92PD,%20respectively.|access-date=2021-09-14|website=cnx.org}}</ref> As a point of reference, the complement to this in Bayesian statistics is the [[Bayes estimator|minimum Bayes risk criterion]].

Because of the reliance of the Neyman-Pearson criteria on our ability to find a range of outcomes where <math>\psi</math> is likely to occur, the Neyman-Pearson approach is only possible where a Fisherian reduction can be achieved.<ref>{{Cite book|last=Cox|first=D. R.|url=https://www.amazon.com/Principles-Statistical-Inference-D-Cox/dp/0521685672|title=Principles of Statistical Inference|date=2006-08-01|pages=24}}</ref>


== Experimental design and methodology ==
== Experimental design and methodology ==
Frequentist inferences are associated with the application [[frequentist probability]] to [[experimental design]] and interpretation, and specifically with the view that any given experiment can be considered one of an infinite sequence of possible repetitions of the same experiment, each capable of producing [[Independence (probability theory)|statistically independent]] results.{{sfnp|Everitt|2002}} In this view, the frequentist inference approach to drawing conclusions from data is effectively to require that the correct conclusion should be drawn with a given (high) probability, among this notional set of repetitions.
Frequentist inferences are associated with the application [[frequentist probability]] to [[experimental design]] and interpretation, and specifically with the view that any given experiment can be considered one of an infinite sequence of possible repetitions of the same experiment, each capable of producing [[Independence (probability theory)|statistically independent]] results.{{sfnp|Everitt|2002}} In this view, the frequentist inference approach to drawing conclusions from data is effectively to require that the correct conclusion should be drawn with a given (high) probability, among this notional set of repetitions.


However, exactly the same procedures can be developed under a subtly different formulation. This is one where a pre-experiment point of view is taken. It can be argued that the [[Design of experiments|design of an experiment]] should include, before undertaking the experiment, decisions about exactly what steps will be taken to reach a conclusion from the data yet to be obtained. These steps can be specified by the scientist so that there is a high probability of reaching a correct decision where, in this case, the probability relates to a yet to occur set of random events and hence does not rely on the frequency interpretation of probability. This formulation has been discussed by Neyman,{{sfnp|Jerzy|1937|pp=236, 333-380}} among others.
However, exactly the same procedures can be developed under a subtly different formulation. This is one where a pre-experiment point of view is taken. It can be argued that the [[Design of experiments|design of an experiment]] should include, before undertaking the experiment, decisions about exactly what steps will be taken to reach a conclusion from the data yet to be obtained. These steps can be specified by the scientist so that there is a high probability of reaching a correct decision where, in this case, the probability relates to a yet to occur set of random events and hence does not rely on the frequency interpretation of probability. This formulation has been discussed by Neyman,{{sfnp|Jerzy|1937|pp=236, 333-380}} among others. This is especially pertinent because the significance of a frequentist test can vary under model selection, a violation of the likelihood principle.

== The Statistical Philosophy of Frequentism ==
Frequentism is the study of probability with the assumption that results occur with a given frequency over some period of time or with repeated sampling. As such, frequentist analysis must be formulated with consideration to the assumptions of the problem frequentism attempts to analyze. This requires looking into whether the question at hand is concerned with understanding variety of a statistic or locating the true value of a statistic. The next paragraph elaborates on this.

There are broadly two camps of statistical inference, the '''epistemic''' approach and the '''epidemiological''' approach. The '''epistemic''' approach is the study of ''variability''; namely, how often do we expect a statistic to deviate from some observed value. The '''epidemiological''' approach is concerned with the study of ''uncertainty''; in this approach, the value of the statistic is fixed and but our understanding of that statistic is incomplete. For concreteness, imagine trying to measure the stock market versus evaluating an asset. The stock market fluctuates so greatly that trying to find exactly where a stock price is going to be is not useful: the stock market is better understood using the epistemic approach, where we can find a range of stock values. Conversely, the price of an asset might not change that much from day to day: it is better to locate the true value of the asset rather than find a range of prices and thus the epidemiological approach is better. The difference between these approaches is non-trivial for the purposes of inference.

For the epistemic approach, we formulate the problem as if we want to attribute probability to a hypothesis. Unfortunately, this kind of approach is (for highly rigorous reasons) best answered with Bayesian statistics, where the interpretation of probability is straightforward because Bayesian statistics is conditional on the entire sample space, whereas frequentist data is inherently conditional on unobserved and unquantifiable data. The reason for this is inherent to frequentist design. Frequentist statistics is unfortunately conditioned not on solely the data but also on the ''experimental design''.<ref name=":0">{{Citation|last=Wagenmakers|first=Eric-Jan|title=Bayesian Versus Frequentist Inference|date=2008|url=https://doi.org/10.1007/978-0-387-09612-4_9|work=Bayesian Evaluation of Informative Hypotheses|pages=181–207|editor-last=Hoijtink|editor-first=Herbert|series=Statistics for Social and Behavioral Sciences|place=New York, NY|publisher=Springer|language=en|doi=10.1007/978-0-387-09612-4_9|isbn=978-0-387-09612-4|access-date=2021-09-14|last2=Lee|first2=Michael|last3=Lodewyckx|first3=Tom|last4=Iverson|first4=Geoffrey J.|editor2-last=Klugkist|editor2-first=Irene|editor3-last=Boelen|editor3-first=Paul A.}}</ref> In frequentist statistics, the cutoff for understanding the frequency occurrence is derived from the family distribution used in the experiment design. For example, a binomial distribution and a negative binomial distribution can be used to analyze exactly the same data, but because their tail ends are different the frequentist analysis will realize different levels of statistical significance for the same data that assumes different probability distributions. For more, see the [[likelihood principle]], which frequentist statistics inherently violates.<ref>{{Cite web|last=Vidakovic|first=Brani|title=The Likelihood Principle|url=http://people.eecs.berkeley.edu/~russell/classes/cs294/f05/papers/vidakovic-handout2.pdf|url-status=live}}</ref>

For the epidemiological approach, the central idea behind frequentist statistics must be discussed. Frequentist statistics is designed so that, in the ''long-run'', the frequency of a statistic may be understood, and in the ''long-run'' the range of the true mean of a statistic can be inferred. The following approach is based on the Fisherian reduction and the Neyman-Pearson operational criteria, discussed above. When we define the Fisherian reduction and the Neyman-Pearson operational criteria for any statistic, we are assessing, according to these authors, the likelihood that the true value of the statistic will occur within a given range of outcomes assuming a number of repetitions of our sampling method.<ref name=":0" />


== Relationship with other approaches ==
== Relationship with other approaches ==

Revision as of 16:30, 14 September 2021

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” as equivalent with “frequency” and draws conclusions from sample data by emphasizing the frequency or proportion of findings in the data. The frequentist inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are based.

History of Frequentist Statistics

The history of frequentist statistics is more recent than it's prevailing philosophical rival, Bayesian statistics. Frequentist statistics were largely developed in the early 20th century and have, by modern times, evolved to become the dominant paradigm in inferential statistics, while Bayesian statistics were invented in the 19th century. Despite this dominance, there is no agreement as to whether frequentism is better than Bayesian statistics, with a vocal minority of professionals studying statistical inference decrying frequentist inference as internally inconsistent. For the purposes of this article, frequentist methodology will be discussed as summarily as possible but it is worth noting that this subject remains controversial even into the modern day.

The primary formulation of frequentism from the philosophical view that statistics can be understood in a probabilistic frequency. This view was primarily developed by Ronald Fisher and the team of Jerzy Neyman and Egon Pearson. Ronald Fisher's contribution to frequentist statistics is that he developed the frequentist concept of "significance testing", which is the study of the significance of a measure of a statistic when compared to the hypothesis. Neyman-Pearson extended Fisher's ideas to multiple hypotheses by conjecturing that the ratio of probabilities of hypotheses where maximizing the difference between the two hypotheses leads to a maximization of exceeding a given p-value, and also provides the basis of type I and type II errors. For more, see the foundations of statistics page.

Definition

For statistical inference, the relevant statistic about which we want to make inferences is , where the random vector is a function of an unknown parameter, . The parameter is further partitioned into (), where is the parameter of interest, and is the nuisance parameter. For concreteness, in one area might be the population mean, , and the nuisance parameter would then be the standard deviation of the population mean, .[1]

Thus, statistical inference is concerned with the expectation of random vector , namely .

To construct areas of uncertainty in frequentist inference, a pivot is used which defines the area around that can be used to provide an interval to estimate uncertainty. The pivot is a probability such that for a pivot, , which is a function, that is strictly increasing in , where is a random vector. This allows that, for some 0 < < 1, we can define , which is the probability that the pivot function is less than some well-defined value. This implies , where is a upper limit for . Note that is a range of outcomes that define a one-sided limit for , and that is a two-sided limit for , when we want to estimate a range of outcomes where may occur. This rigorously defines the confidence interval, which is the range of outcomes about which we can make statistical inferences.

Fisherian Reduction and Neyman-Pearson operational criteria

Two complementary concepts in frequentist inference are the Fisherian reduction and the Neyman-Pearson operational criteria. The Fisherian reduction is a method of determining the interval within which the true value of may lie, while the Neyman-Pearson operational criteria is a decision rule about making a priori probability assumptions.

The Fisherian reduction is defined as follows:

  • Determine the likelihood function (this is usually just gathering the data);
  • Reduce to a sufficient statistic of the same dimension as ;
  • Find the function of that has a distribution depending only on ;
  • Invert that distribution (this yields a cumulative distribution function or CDF) to obtain limits for at an arbitrary set of probability levels ;
  • Use the conditional distribution of the data given informally or formally as to asses the adequacy of the formulation.[2]

Essentially, the Fisherian reduction is design to find where the sufficient statistic can be used to determine the range of outcomes where may occur on a probability distribution that defines all the potential values of . This is necessary to formulating confidence intervals, where we can find a range of outcomes over which is likely to occur in the long-run.

The Neyman-Pearon operational criteria is an even more specific understanding of the range of outcomes where the relevant statistic, , can be said to occur in the long run. The Neyman-Pearson operational criteria defines the likelihood of that range actually being adequate or of the range being inadequate. The Neyman-Pearson criteria defines the range of the probability distribution that, if exists in this range, is still below the true population statistic. For example, if the distribution from the Fisherian reduction exceeds a threshold that we consider to be a priori implausible, then the Neyman-Pearson reduction's evaluation of that distribution can be used to infer where looking purely at the Fisherian reduction's distributions can give us inaccurate results. Thus, the Neyman-Pearson reduction is used to find the probability of type I and type II errors.[3] As a point of reference, the complement to this in Bayesian statistics is the minimum Bayes risk criterion.

Because of the reliance of the Neyman-Pearson criteria on our ability to find a range of outcomes where is likely to occur, the Neyman-Pearson approach is only possible where a Fisherian reduction can be achieved.[4]

Experimental design and methodology

Frequentist inferences are associated with the application frequentist probability to experimental design and interpretation, and specifically with the view that any given experiment can be considered one of an infinite sequence of possible repetitions of the same experiment, each capable of producing statistically independent results.[5] In this view, the frequentist inference approach to drawing conclusions from data is effectively to require that the correct conclusion should be drawn with a given (high) probability, among this notional set of repetitions.

However, exactly the same procedures can be developed under a subtly different formulation. This is one where a pre-experiment point of view is taken. It can be argued that the design of an experiment should include, before undertaking the experiment, decisions about exactly what steps will be taken to reach a conclusion from the data yet to be obtained. These steps can be specified by the scientist so that there is a high probability of reaching a correct decision where, in this case, the probability relates to a yet to occur set of random events and hence does not rely on the frequency interpretation of probability. This formulation has been discussed by Neyman,[6] among others. This is especially pertinent because the significance of a frequentist test can vary under model selection, a violation of the likelihood principle.

The Statistical Philosophy of Frequentism

Frequentism is the study of probability with the assumption that results occur with a given frequency over some period of time or with repeated sampling. As such, frequentist analysis must be formulated with consideration to the assumptions of the problem frequentism attempts to analyze. This requires looking into whether the question at hand is concerned with understanding variety of a statistic or locating the true value of a statistic. The next paragraph elaborates on this.

There are broadly two camps of statistical inference, the epistemic approach and the epidemiological approach. The epistemic approach is the study of variability; namely, how often do we expect a statistic to deviate from some observed value. The epidemiological approach is concerned with the study of uncertainty; in this approach, the value of the statistic is fixed and but our understanding of that statistic is incomplete. For concreteness, imagine trying to measure the stock market versus evaluating an asset. The stock market fluctuates so greatly that trying to find exactly where a stock price is going to be is not useful: the stock market is better understood using the epistemic approach, where we can find a range of stock values. Conversely, the price of an asset might not change that much from day to day: it is better to locate the true value of the asset rather than find a range of prices and thus the epidemiological approach is better. The difference between these approaches is non-trivial for the purposes of inference.

For the epistemic approach, we formulate the problem as if we want to attribute probability to a hypothesis. Unfortunately, this kind of approach is (for highly rigorous reasons) best answered with Bayesian statistics, where the interpretation of probability is straightforward because Bayesian statistics is conditional on the entire sample space, whereas frequentist data is inherently conditional on unobserved and unquantifiable data. The reason for this is inherent to frequentist design. Frequentist statistics is unfortunately conditioned not on solely the data but also on the experimental design.[7] In frequentist statistics, the cutoff for understanding the frequency occurrence is derived from the family distribution used in the experiment design. For example, a binomial distribution and a negative binomial distribution can be used to analyze exactly the same data, but because their tail ends are different the frequentist analysis will realize different levels of statistical significance for the same data that assumes different probability distributions. For more, see the likelihood principle, which frequentist statistics inherently violates.[8]

For the epidemiological approach, the central idea behind frequentist statistics must be discussed. Frequentist statistics is designed so that, in the long-run, the frequency of a statistic may be understood, and in the long-run the range of the true mean of a statistic can be inferred. The following approach is based on the Fisherian reduction and the Neyman-Pearson operational criteria, discussed above. When we define the Fisherian reduction and the Neyman-Pearson operational criteria for any statistic, we are assessing, according to these authors, the likelihood that the true value of the statistic will occur within a given range of outcomes assuming a number of repetitions of our sampling method.[7]

Relationship with other approaches

Frequentist inferences stand in contrast to other types of statistical inferences, such as Bayesian inferences and fiducial inferences. While the "Bayesian inference" is sometimes held to include the approach to inferences leading to optimal decisions, a more restricted view is taken here for simplicity.

Bayesian inference

Bayesian inference is based in Bayesian probability, which treats “probability” as equivalent with “certainty”, and thus that the essential difference between the frequentist inference and the Bayesian inference is the same as the difference between the two interpretations of what a “probability” means. However, where appropriate, Bayesian inferences (meaning in this case an application of Bayes' theorem) are used by those employing frequency probability.

There are two major differences in the frequentist and Bayesian approaches to inference that are not included in the above consideration of the interpretation of probability:

  1. In a frequentist approach to inference, unknown parameters are often, but not always, treated as having fixed but unknown values that are not capable of being treated as random variates in any sense, and hence there is no way that probabilities can be associated with them. In contrast, a Bayesian approach to inference does allow probabilities to be associated with unknown parameters, where these probabilities can sometimes have a frequency probability interpretation as well as a Bayesian one. The Bayesian approach allows these probabilities to have an interpretation as representing the scientist's belief that given values of the parameter are true (see Bayesian probability - Personal probabilities and objective methods for constructing priors).
  2. While "probabilities" are involved in both approaches to inference, the probabilities are associated with different types of things. The result of a Bayesian approach can be a probability distribution for what is known about the parameters given the results of the experiment or study. The result of a frequentist approach is either a "true or false" conclusion from a significance test or a conclusion in the form that a given sample-derived confidence interval covers the true value: either of these conclusions has a given probability of being correct, where this probability has either a frequency probability interpretation or a pre-experiment interpretation.

See also

References

  1. ^ Cox, D. R. (2006-08-01). Principles of Statistical Inference. pp. 1–2.
  2. ^ Cox, D. R. (2006-08-01). Principles of Statistical Inference. pp. 24, 47.
  3. ^ "OpenStax CNX". cnx.org. Retrieved 2021-09-14.
  4. ^ Cox, D. R. (2006-08-01). Principles of Statistical Inference. p. 24.
  5. ^ Everitt (2002).
  6. ^ Jerzy (1937), pp. 236, 333–380.
  7. ^ a b Wagenmakers, Eric-Jan; Lee, Michael; Lodewyckx, Tom; Iverson, Geoffrey J. (2008), Hoijtink, Herbert; Klugkist, Irene; Boelen, Paul A. (eds.), "Bayesian Versus Frequentist Inference", Bayesian Evaluation of Informative Hypotheses, Statistics for Social and Behavioral Sciences, New York, NY: Springer, pp. 181–207, doi:10.1007/978-0-387-09612-4_9, ISBN 978-0-387-09612-4, retrieved 2021-09-14
  8. ^ Vidakovic, Brani. "The Likelihood Principle" (PDF).{{cite web}}: CS1 maint: url-status (link)

Bibliography