Parameter identification problem
||This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (December 2009) (Learn how and when to remove this template message)|
- For a more technical treatment, see Identifiability.
In statistics and econometrics, the parameter identification problem is the inability in principle to identify a best estimate of the value(s) of one or more parameters in a regression. This problem can occur in the estimation of multiple-equation econometric models where the equations have variables in common.
More generally, the term can be used to refer to any situation where a statistical model will invariably have more than one set of parameters that generate the same distribution of observations, meaning that multiple parametrizations are observationally equivalent.
Standard example, with two equations
Consider a linear model for the supply and demand of some specific good. The quantity demanded varies negatively with the price: a higher price decreases the quantity demanded. The quantity supplied varies directly with the price: a higher price increases the quantity supplied.
Assume that, say for several years, we have data on both the price and the traded quantity of this good. Unfortunately this is not enough to identify the two equations (demand and supply) using regression analysis on observations of Q and P: one cannot estimate a downward slope and an upward slope with one linear regression line involving only two variables. Additional variables can make it possible to identify the individual relations.
In the graph shown here, the supply curve (red line, upward sloping) shows the quantity supplied depending positively on the price, while the demand curve (black lines, downward sloping) shows quantity depending negatively on the price and also on some additional variable Z, which affects the location of the demand curve in quantity-price space. This Z might be consumers' income, with a rise in income shifting the demand curve outwards. This is symbolically indicated with the values 1, 2 and 3 for Z.
With the quantities supplied and demanded being equal, the observations on quantity and price are the three white points in the graph: they reveal the supply curve. Hence the effect of Z on demand makes it possible to identify the (positive) slope of the supply equation. The (negative) slope parameter of the demand equation cannot be identified in this case. In other words, the parameters of an equation can be identified if it is known that some variable does not enter into the equation, while it does enter the other equation.
A situation in which both the supply and the demand equation are identified arises if there is not only a variable Z entering the demand equation but not the supply equation, but also a variable X entering the supply equation but not the demand equation:
with positive bS and negative bD. Here both equations are identified if c and d are nonzero.
Note that this is the structural form of the model, showing the relations between the Q and P. The reduced form however can be identified easily.
Estimation methods and disturbances
"It is important to note that the problem is not one of the appropriateness of a particular estimation technique. In the situation described [without the Z variable], there clearly exists no way using any technique whatsoever in which the true demand (or supply) curve can be estimated. Nor, indeed, is the problem here one of statistical inference—of separating out the effects of random disturbance. There is no disturbance in this model [...] It is the logic of the supply-demand equilibrium itself which leads to the difficulty." (Fisher 1966, p. 5)
More generally, consider a linear system of M equations, with M > 1.
An equation cannot be identified from the data if less than M − 1 variables are excluded from that equation. This is a particular form of the order condition for identification. (The general form of the order condition deals also with restrictions other than exclusions.) The order condition is necessary but not sufficient for identification.
The rank condition is a necessary and sufficient condition for identification. In the case of only exclusion restrictions, it must "be possible to form at least one nonvanishing determinant of order M − 1 from the columns of A corresponding to the variables excluded a priori from that equation" (Fisher 1966, p. 40), where A is the matrix of coefficients of the equations. This is the generalization in matrix algebra of the requirement "while it does enter the other equation" mentioned above (in the line above the formulas).
Related use of the term
- Errors-in-variables model#Linear model
- Instrumental variable#Identification
- Simultaneous equations model
- System of linear equations
- Fisher, Franklin M. (1966). The Identification Problem in Econometrics. ISBN 0-88275-344-4.
- Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill Irwin. pp. 692–698. ISBN 978-0-07-337577-9.
- Hayashi, Fumio (2000). Econometrics. Princeton University Press. pp. 200–203. ISBN 0-691-01018-8.
- Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 660–672. ISBN 0-02-365070-2.
- Koopmans, Tjalling C. (1949). "Identification problems in economic model construction". Econometrica. 17 (2): 125–144. JSTOR 1905689. ("A classic and masterful exposition of the subject", Fisher 1966, p. 31)