Dirty data

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 50.35.32.178 (talk) at 16:50, 8 October 2019 (→‎Examples: The previous title examples, are examples of dirty data in a social context, while beginning of the article is about Dirty Data in a data science context.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Dirty data, also known as rogue data,[1] are inaccurate, incomplete or inconsistent data, especially in a computer system or database.[2]

Dirty data can contain such mistakes as spelling or punctuation errors, incorrect data associated with a field, incomplete or outdated data, or even data that has been duplicated in the database. They can be cleaned through a process known as data cleansing.[3]

Dirty Data (Social)

Following the definition of Gary T. Marx, Professor Emeritus of MIT, there are four types of data:[4]

  • Nonsecretive and nondiscrediting data:
    • Routinely available information.
  • Secretive and nondiscrediting data:
    • Strategic and fraternal secrets, privacy.
  • Nonscretive and discrediting data:
    • sanction immunity,
    • normative dissensus,
    • selective dissensus,
    • making good on a threat for credibility,
    • discovered dirty data.
  • Secretive and discrediting data: Hidden and dirty data.

See also

References

  1. ^ Spotless version 12 out now
  2. ^ Margaret Chu (2004), "What Are Dirty Data?", Blissful Data, p. 71 et seq, ISBN 9780814407806
  3. ^ Wu, S. (2013), "A review on coarse warranty data and analysis" (PDF), Reliability Engineering and System, 114: 1–11, doi:10.1016/j.ress.2012.12.021
  4. ^ "Notes on the discovery, collection, and assessment of hidden and". web.mit.edu. Retrieved 2017-02-17.