Data generating process

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The term data generating process is used in statistical and scientific literature to convey a number of different ideas:

  • the data collection process, being routes and procedures by which data reach a database (particularly where these may change over time);
  • a specific statistical model that is being used to represent supposed random variations in observations, often in terms of explanatory and/or latent variables
  • a notional and non-specific probabilistic model (not directly described or explicitly set down) that would include all of the random influences that combine together to lead to individual observations, where one instance would be the supposed justification of the "common occurrence" of the normal distribution in terms of a combination of multiple random additive effects.