# Twisting properties

Starting with a sample ${\displaystyle \{x_{1},\ldots ,x_{m}\}}$ observed from a random variable X having a given distribution law with a non-set parameter, a parametric inference problem consists of computing suitable values – call them estimates – of this parameter precisely on the basis of the sample. An estimate is suitable if replacing it with the unknown parameter does not cause major damage in next computations. In algorithmic inference, suitability of an estimate reads in terms of compatibility with the observed sample.

In turn, parameter compatibility is a probability measure that we derive from the probability distribution of the random variable to which the parameter refers. In this way we identify a random parameter Θ compatible with an observed sample. Given a sampling mechanism ${\displaystyle M_{X}=(g_{\theta },Z)}$, the rationale of this operation lies in using the Z seed distribution law to determine both the X distribution law for the given θ, and the Θ distribution law given an X sample. Hence, we may derive the latter distribution directly from the former if we are able to relate domains of the sample space to subsets of Θ support. In more abstract terms, we speak about twisting properties of samples with properties of parameters and identify the former with statistics that are suitable for this exchange, so denoting a well behavior w.r.t. the unknown parameters. The operational goal is to write the analytic expression of the cumulative distribution function ${\displaystyle F_{\Theta }(\theta )}$, in light of the observed value s of a statistic S, as a function of the S distribution law when the X parameter is exactly θ.

## Method

Given a sampling mechanism ${\displaystyle M_{X}=(g_{\theta },Z)}$ for the random variable X, we model ${\displaystyle {\boldsymbol {X}}=\{X_{1},\ldots ,X_{m}\}}$ to be equal to ${\displaystyle \{g_{\theta }(Z_{1}),\ldots ,g_{\theta }(Z_{m})\}}$. Focusing on a relevant statistic ${\displaystyle S=h_{1}(X_{1},\ldots ,X_{m})}$ for the parameterθ, the master equation reads

${\displaystyle s=h(g_{\theta }(z_{1}),\ldots ,g_{\theta }(z_{m}))=\rho (\theta ;z_{1},\ldots ,z_{m})}$.

When s is a well-behaved statistic w.r.t the parameter, we are sure that a monotone relation exists for each ${\displaystyle {\boldsymbol {z}}=\{z_{1},\ldots ,z_{m}\}}$ between s and θ. We are also assured that Θ, as a function of ${\displaystyle {\boldsymbol {Z}}}$ for given s, is a random variable since the master equation provides solutions that are feasible and independent of other (hidden) parameters.[1]

The direction of the monotony determines for any ${\displaystyle {\boldsymbol {z}}}$ a relation between events of the type ${\displaystyle s\geq s'\leftrightarrow \theta \geq \theta '}$ or vice versa ${\displaystyle s\geq s'\leftrightarrow \theta \leq \theta '}$, where ${\displaystyle s'}$ is computed by the master equation with ${\displaystyle \theta '}$. In the case that s assumes discrete values the first relation changes into ${\displaystyle s\geq s'\rightarrow \theta \geq \theta '\rightarrow s\geq s'+\ell }$ where ${\displaystyle \ell >0}$ is the size of the s discretization grain, idem with the opposite monotony trend. Resuming these relations on all seeds, for s continuous we have either

${\displaystyle F_{\Theta |S=s}(\theta )=F_{S|\Theta =\theta }(s)}$

or

${\displaystyle F_{\Theta |S=s}(\theta )=1-F_{S|\Theta =\theta }(s)}$

For s discrete we have an interval where ${\displaystyle F_{\Theta |S=s}(\theta )}$ lies, because of ${\displaystyle \ell >0}$. The whole logical contrivance is called a twisting argument. A procedure implementing it is as follows.

## Algorithm

Generating a parameter distribution law through a twisting argument
Given a sample ${\displaystyle \{x_{1},\ldots ,x_{m}\}}$ from a random variable with parameter θ unknown,
1. Identify a well behaving statistic S for the parameter θ and its discretization grain ${\displaystyle \ell }$ (if any);
2. decide the monotony versus;
3. compute ${\displaystyle F_{\Theta }(\theta )\in \left(q_{1}(F_{S|\Theta =\theta }(s)),q_{2}(F_{S|\Theta =\theta }(s))\right)}$ where:
• if S is continuous ${\displaystyle q_{1}=q_{2}}$
• if S is discrete
1. ${\displaystyle q_{2}(F_{S}(s))=q_{1}(F_{S}(s-\ell )}$ if s does not decrease with θ
2. ${\displaystyle q_{1}(F_{S}(s))=q_{2}(F_{S}(s-\ell )}$ if s does not increase with θ and
3. ${\displaystyle q_{i}(F_{S})=1-F_{S}}$ if s does not decrease with θ and ${\displaystyle q_{i}(F_{S})=F_{S}}$ if s does not increase with θ for ${\displaystyle i=1,2}$.

## Remark

The rationale behind twisting arguments does not change when parameters are vectors, though some complication arises from the management of joint inequalities. Instead, the difficulty of dealing with a vector of parameters proved to be the Achilles heel of Fisher's approach to the fiducial distribution of parameters (Fisher 1935). Also Fraser’s constructive probabilities (Fraser 1966) devised for the same purpose do not treat this point completely.

## Example

For ${\displaystyle {\boldsymbol {x}}}$ drawn from a Gamma distribution, whose specification requires values for the parameters λ and k, a twisting argument may be stated by following the below procedure. Given the meaning of these parameters we know that

 ${\displaystyle (k\leq k')\leftrightarrow (s_{k}\leq s_{k'})}$ for fixed λ, and ${\displaystyle (\lambda \leq \lambda ')\leftrightarrow (s_{\lambda '}\leq s_{\lambda })}$ for fixed k

where ${\displaystyle s_{k}=\prod _{i=1}^{m}x_{i}}$ and ${\displaystyle s_{\lambda }=\sum _{i=1}^{m}x_{i}}$. This leads to a joint cumulative distribution function ${\displaystyle F_{\Lambda ,K}(\lambda ,k)=F_{\Lambda |k}(\lambda |k)F_{K}(k)=F_{K|\lambda }(k|\lambda )F_{\Lambda }(\lambda )}$. Using the first factorization and replacing ${\displaystyle s_{k}}$ with ${\displaystyle r_{k}={\frac {s_{k}}{s_{\lambda }^{m}}}}$ in order to have a distribution of ${\displaystyle K}$ that is independent of ${\displaystyle \Lambda }$, we have

${\displaystyle F_{\Lambda |k}(\lambda |k)=1-{\frac {\Gamma (km,\lambda s_{\Lambda })}{\Gamma (km)}}}$
${\displaystyle F_{K}(k)=1-F_{R_{k}}(r_{K})}$

with m denoting the sample size, ${\displaystyle s_{\Lambda }}$ and ${\displaystyle r_{K}}$ are the observed statistics (hence with indices denoted by capital letters), ${\displaystyle \Gamma (a,b)}$ the Incomplete Gamma function and ${\displaystyle F_{R_{k}}(r_{K})}$ the Fox's H function that can be approximated with a Gamma distribution again with proper parameters (for instance estimated through the method of moments) as a function of k and m.

Joint probability density function of parameters ${\displaystyle (K,\Lambda )}$ of a Gamma random variable.
Marginal cumulative distribution function of parameter K of a Gamma random variable.

With a sample size ${\displaystyle m=30,s_{\Lambda }=72.82}$ and ${\displaystyle r_{K}=}$ ${\displaystyle 4.5\times 10^{-46}}$, you may find the joint p.d.f. of the Gamma parameters K and ${\displaystyle \Lambda }$ on the left. The marginal distribution of K is reported in the picture on the right.

## Notes

1. ^ By default, capital letters (such as U, X) will denote random variables and small letters (u, x) their corresponding realizations.

## References

• Fisher, M.A. (1935). "The fiducial argument in statistical inference". Annals of Eugenics. 6: 391–398. doi:10.1111/j.1469-1809.1935.tb02120.x.
• Fraser, D. A. S. (1966). "Structural probability and generalization". Biometrika. 53 (1/2): 1–9. doi:10.2307/2334048.
• Apolloni, B; Malchiodi, D.; Gaito, S. (2006). Algorithmic Inference in Machine Learning. International Series on Advanced Intelligence. 5 (2nd ed.). Adelaide: Magill. Advanced Knowledge International