Mokken scale

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Mokken Scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.[1]

Mokken Scales have been used in psychology,[2] education,[3] political science,[1][4] public opinion[5] and medicine.[6]


An example of an Item Response Function
Item Response Functions that differ in their difficulty
Item Response Functions that differ in their discrimination function

Mokken scaling belongs to Item response theory. In essence, a Mokken scale is a non-parametric, probabilistic version of Guttman scale. Both Guttman and Mokken scaling can be used to assess whether a number of items measure the same underlying concept. Both Guttman and Mokken scaling are based on the assumption that the items are hierarchically ordered: this means that they are ordered by degree of "difficulty". Difficulty here means the percentage of respondents that answers the question affirmatively. The hierarchical order means that a respondent who answered a difficult question correctly is assumed to answer an easy question correctly.[7] The key difference between a Guttman and Mokken scale is that Mokken scaling is probabilistic in nature. The assumption is not that every respondent who answered a difficult question affirmatively will necessarily answer an easy question affirmatively. Violations of this are called Guttman errors. Instead, the assumption is that respondents who answered a difficult question affirmatively are more likely answer an easy question affirmatively. The scalability of the scale is measured by Loevinger's coefficient H. H compares the actual Guttman errors to the expected number of errors if the items would be unrelated.[7]

The chance that a respondent will answer an item correctly is described by an Item Response Function. Mokken scales are similar to Rasch scales, in that they both adapted Guttman scales to a probabilistic model. The key difference between Mokken scales and Rasch scales is that the latter assumes that all items have the same Item Response Function. In Mokken scaling the Item Response Functions differ for different items.[4]

Mokken scales can come in two forms: first as the Double Monotonicity model, where the items can differ in their difficulty. It is essentially, is an ordinal version of Rasch scale; and second, as the Monotone Homogeneity model, where items differ in their discrimination parameter, which means that there can be a weaker relationship between some items and the latent variable and other items and the latent variable.[4] Double Monotonicity models are used most often.

Double Monotonicity models are based on three assumptions.[4]

  1. There is a unidimensional latent trait on which subject and items can be ordered.
  2. The item response function is monotonically nondecreasing. This means that as one moves from one side of the latent variable to the other, the chance of giving a positive response should never decrease.
  3. The items are locally stochastically independent: this means that responses to any two items by the same respondent should not be the function any other aspect of the respondent or the item, but his or her position on the latent trait.[4]

There has been some confusion in Mokken scaling between the concepts of Double Monotonicity model and invariant item ordering. The latter implies that all respondents to a series of questions all respond to them in the same order across the whole range of the latent trait. For dichotomously scored items, the Double Monotonicity model can mean invariant item ordering; however, for polytomously scored items this does not necessarily hold. [8] For invariant item ordering to hold not only should the item response functions not intersect, also, the item step response function between one level and the next within each item must not intersect. [9]

The issue of sample size for Mokken scaling is largely unresolved. The best estimates available come from work using simulated samples and varying the item quality in the scales (Loevinger's coefficient and the correlation between scales). This work suggest that where the quality of the items is high that lower samples sizes in the region of 250-500 are required compared with sample sizes of 1250-1750 where the item quality is low.[3]


While Mokken scaling analysis was originally developed to measure the extent to which individual dichotomous items form a scale, it has since been extended for polytomous items.[4] Moreover, while Mokken scaling analysis is a confirmatory method, meant to test whether a number of items form a coherent scale (like Confirmatory factor analysis), an Automatic Item Selection Procedure has been developed to explore which latent dimensions structure responses on a number of observable items (like Factor analysis).[10]


  1. ^ a b Mokken, Rob (1971). A theory and procedure of scale analysis: With applications in political research. Walter de Gruyter. 
  2. ^ Bedford, A.; Watson, R.; Lyne, J.; Tibbles, J.; Davies, F.; Deary, I.J. (2009). "Mokken scaling and principal components analyses of the CORE-OM in a large clinical sample". Clinical Psychology and Psychotherapy. 17 (1): 51–62. doi:10.1002/cpp.649. 
  3. ^ a b Straat, J.H., Van Ark, L.A. and Sijtsma, K. (2014) Minimum Sample Size Requirements for Mokken Scale Analysis in Educational and Psychological Measurement Volume: 74 issue: 5, page(s): 809-822
  4. ^ a b c d e f van Schuur, Wijbrandt (2003). "Mokken scale analysis: Between the Guttman scale and parametric item response theory". Political Analysis. SPM-PMSAPS. 11 (2): 139–163. doi:10.1093/pan/mpg002. 
  5. ^ Gillespie, M.; Tenvergert, E.M.; Kingma, J. (1987). "[Using Mokken scale analysis to develop unidimensional scales]". Quantity and Quality. 21 (4): 393–408.  External link in |title= (help)
  6. ^ Stochl, J.; Jones, P.B.; Croudance, C.J. (2012). "Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers". BMC Medical Research Technology. 12: 74. doi:10.1186/1471-2288-12-74. 
  7. ^ a b Crichton, N. (1999) "Mokken Scale Analysis" Journal of Clinical Nursing 8, 388
  8. ^ Ligtvoet, R., van der Ark, L.A., te Marvelde J.M., and Sijtsma, K. (2010) Investigating an Invariant Item Ordering for Polytomously Scored Items in Educational and Psychological Measurement Volume: 70 issue: 4, page(s): 578-595
  9. ^ Sijtsma, K., Meijer R.R., van der Ark, L.A. (2011) Mokken scale analysis as time goes by: An update for scaling practitioners Personality and Individual Differences (2011) Volume: 50, page(s): 31–37
  10. ^ van der Ark, L.A. (Andries) (2012). "New Developments in Mokken Scale Analysis in R". Journal of Statistical Software. 48 (5). doi:10.18637/jss.v048.i05.