Data anonymization

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

The European Union's new General Data Protection Regulation demands that stored data on people in the EU undergo either an anonymization or a pseudonymization process.[1]


Data anonymization has been defined as a "process by which personal data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." [2] Data anonymization enables the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization.

In the context of medical data, anonymized data refers to data from which the patient cannot be identified by the recipient of the information. The name, address, and full post code must be removed, together with any other information which, in conjunction with other data held by or disclosed to the recipient, could identify the patient.[3]

De-anonymization is the reverse process in which anonymous data is cross-referenced with other data sources to re-identify the anonymous data source.[4] Generalization and perturbation are the two popular anonymization approaches for relational data.[5] The process of obscuring data with the ability to re-identify it later is also called pseudonymization and is one way companies can store data in a way that is HIPAA compliant.[6]

Data anonymization tools[edit]

Ordered alphabetically

See also[edit]


  1. ^ Data science under GDPR with pseudonymization in the data pipeline Published by Dativa, 17 April 2018
  2. ^ ISO 25237:2017 Health informatics -- Pseudonymization. ISO. 2017. p. 7.
  3. ^ "Data anonymization". The Free Medical Dictionary. Retrieved 17 January 2014.
  4. ^ "De-anonymization". Retrieved 17 January 2014.
  5. ^ Bin Zhou; Jian Pei; WoShun Luk (December 2008). "A brief survey on anonymization techniques for privacy preserving publishing of social network data" (PDF). Newsletter ACM SIGKDD Explorations Newsletter. 10 (2): 12–22.
  6. ^ "Data de-identification - an easier way to HIPAA compliance". Truevault. TrueVault.
  7. ^ "Anonymization platform for documents". Retrieved 2019-09-06.
  8. ^ "Aircloak Insights". Retrieved 2019-06-24.
  9. ^ "CA Test Data Manager | CA Communities". Retrieved 2018-11-22.
  10. ^ Inc, CloverDX. "Data Anonymization". Retrieved 2019-11-01.
  11. ^ "Test Data Privacy - Compuware". Compuware. Retrieved 2018-11-22.
  12. ^ "Database Protector | Protegrity". Protegrity. Retrieved 2018-11-22.
  13. ^ "DOT-Anonymizer". Retrieved 2019-07-18. External link in |website= (help)
  14. ^ "Data Masking: Data Obfuscation & Encryption | Informatica US". Retrieved 2018-11-22.
  15. ^ "Dynamic and Static Data Masking, Data Anonymization and Encryption".
  16. ^ "IBM Knowledge Center". Retrieved 2018-11-22.
  17. ^ "index". Retrieved 2018-11-22.
  18. ^ "IRI FieldShield Data Masking | IRI, The CoSort Company". Retrieved 2018-11-22.
  19. ^ "Data Express | Micro Focus". Retrieved 2018-11-22.
  20. ^ "Privacy Analytics Eclipse". Retrieved 2019-03-15.
  21. ^ "Privitar Publisher TM". Retrieved 2019-01-04.
  22. ^ Inc, Protegrity USA. "Data Security Software | Protegrity". Retrieved 2020-01-03.
  23. ^ "Vormetric Vaultless Tokenization with Dynamic Data Masking | Vaultless Data Tokenization | Thales eSecurity". Retrieved 2018-11-22.
  24. ^ "Truata anonymization solution". Retrieved 2019-09-05.
  25. ^ "Soflab". Retrieved 2018-11-22.

Further reading[edit]

  • Raghunathan, Balaji (June 2013). The Complete Book of Data Anonymization: From Planning to Implementation. CRC Press. ISBN 9781482218565.
  • Khaled El Emam, Luk Arbuckle (August 2014). Anonymizing Health Data: Case Studies and Methods to Get You Started. O'Reilly Media. ISBN 978-1-4493-6307-9.
  • Rolf H. Weber, Ulrike I. Heinrich (2012). Anonymization: SpringerBriefs in Cybersecurity. Springer. ISBN 9781447140665.
  • Aris Gkoulalas-Divanis, Grigorios Loukides (2012). Anonymization of Electronic Medical Records to Support Clinical Analysis (SpringerBriefs in Electrical and Computer Engineering). Springer. ISBN 9781461456674.
  • Pete Warden. "Why you can't really anonymize your data". O'Reilly Media, Inc. Archived from the original on 9 January 2014. Retrieved 17 January 2014.

External links[edit]