Data anonymization

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.

Overview

Data anonymization has been defined as a "process by which personal data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." ^[1] Data anonymization may enable the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization.

In the context of medical data, anonymized data refers to data from which the patient cannot be identified by the recipient of the information. The name, address, and full post code must be removed, together with any other information which, in conjunction with other data held by or disclosed to the recipient, could identify the patient.^[2]

There will always be a risk that anonymized data may not stay anonymous over time. Pairing the anonymized dataset with other data, clever techniques and raw power are some of the ways previously anonymous data sets have become de-anonymized; The data subjects are no longer anonymous.

De-anonymization is the reverse process in which anonymous data is cross-referenced with other data sources to re-identify the anonymous data source.^[3] Generalization and perturbation are the two popular anonymization approaches for relational data.^[4] The process of obscuring data with the ability to re-identify it later is also called pseudonymization and is one way companies can store data in a way that is HIPAA compliant.^[5]

GDPR requirements

The European Union's new General Data Protection Regulation (GDPR) demands that stored data on people in the EU undergo either an anonymization or a pseudonymization process.^[6] GDPR Recital (26) establishes a very high bar for what constitutes anonymous data, thereby exempting the data from the requirements of the GDPR, namely “…information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.” The European Data Protection Supervisor (EDPS) and the Spanish Agencia Española de Protección de Datos (AEPD) have issued joint guidance related to requirements for anonymity and exemption from GDPR requirements. According to the EDPS and AEPD no one, including the data controller, should be able to re-identify data subjects in a properly anonymised dataset. ^[7] Research by data scientists^[8] at Imperial College in London and Université Catholique de Louvain in Belgium, as well as a ruling by Judge Michal Agmon-Gonen of the Tel Aviv District Court,^[9] highlight the shortcomings of "Anonymisation" in today's big data world. Anonymisation reflects an outdated approach to data protection^[10] that was developed when the processing of data was limited to isolated (siloed) applications prior to the popularity of “big data” processing involving the widespread sharing and combining of data.

References

^ ISO 25237:2017 Health informatics -- Pseudonymization. ISO. 2017. p. 7.
^ "Data anonymization". The Free Medical Dictionary. Retrieved 17 January 2014.
^ "De-anonymization". Whatis.com. Retrieved 17 January 2014.
^ Bin Zhou; Jian Pei; WoShun Luk (December 2008). "A brief survey on anonymization techniques for privacy preserving publishing of social network data" (PDF). Newsletter ACM SIGKDD Explorations Newsletter. 10 (2): 12–22.
^ "Data de-identification - an easier way to HIPAA compliance". Truevault. TrueVault.
^ Data science under GDPR with pseudonymization in the data pipeline Published by Dativa, 17 April 2018
^ "INTRODUCTION TO THE HASH FUNCTION AS A PERSONAL DATA PSEUDONYMISATION TECHNIQUE" (PDF).{{cite web}}: CS1 maint: url-status (link)
^ "Your Data Were 'Anonymized'? These Scientists Can Still Identify You".{{cite web}}: CS1 maint: url-status (link)
^ "Attm (TA) 28857-06-17 Nursing Companies Association v. Ministry of Defense".{{cite web}}: CS1 maint: url-status (link)
^ "Data is up for grabs under outdated Israeli privacy law, think tank says".{{cite web}}: CS1 maint: url-status (link)

External links

on the anonymization of Internet traffic: Data Sharing and Anonymization Reading List

[1] ISO 25237:2017 Health informatics -- Pseudonymization. ISO. 2017. p. 7.

[2] "Data anonymization". The Free Medical Dictionary. Retrieved 17 January 2014.

[3] "De-anonymization". Whatis.com. Retrieved 17 January 2014.

[4] Bin Zhou; Jian Pei; WoShun Luk (December 2008). "A brief survey on anonymization techniques for privacy preserving publishing of social network data" (PDF). Newsletter ACM SIGKDD Explorations Newsletter. 10 (2): 12–22.

[5] "Data de-identification - an easier way to HIPAA compliance". Truevault. TrueVault.

[6] Data science under GDPR with pseudonymization in the data pipeline Published by Dativa, 17 April 2018

[7] "INTRODUCTION TO THE HASH FUNCTION AS A PERSONAL DATA PSEUDONYMISATION TECHNIQUE" (PDF).{{cite web}}: CS1 maint: url-status (link)

[8] "Your Data Were 'Anonymized'? These Scientists Can Still Identify You".{{cite web}}: CS1 maint: url-status (link)

[9] "Attm (TA) 28857-06-17 Nursing Companies Association v. Ministry of Defense".{{cite web}}: CS1 maint: url-status (link)

[10] "Data is up for grabs under outdated Israeli privacy law, think tank says".{{cite web}}: CS1 maint: url-status (link)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Data anonymization

Overview

GDPR requirements

See also

References

Further reading

External links