Local differential privacy

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Local differential privacy (LDP) is a model of differential privacy with the added restriction that even if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn too much about the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.[1]

With society growing ever more reliant on the digital world and data driven decision making, the smart devices we all have collect extensive statistics and analysis of our personal data that threatens the privacy of users. The driven data fusion and analysis techniques only exposes the users to become more vulnerable to attacks and disclosure in the big data era.  To aid in this privacy concern, local differential privacy is one possible solution. Local differential privacy (LDP)[2] is seen as a widely recognized and prevalent privacy model with distributed architecture which can provide strong privacy guarantees for each user while collecting and analyzing data from privacy leaks on both the client and server side.[3] Furthermore, pendant of any assumptions on the third-party servers, LDP has been imposed as the cutting-edge of research on privacy protection and risen in prominence not only from theoretical interests, but also subsequently from a practical perspective. Due to its powerfulness, LDP has been widely adopted to alleviate the privacy concerns of each user.[4]

History[edit]

In 2003, Alexandre V. Evfimievski, Johannes Gehrke, Ramakrishnan Srikant[5] gave a definition equivalent to local differential privacy. In 2008, Kasiviswanathan et al.[6] gave a formal definition conforming with the standard definition of differential privacy.

The prototypical example of a locally differential private mechanism is the randomized response survey technique proposed by Stanley L. Warner in 1965, predating modern discussions of privacy.[7] Warner's innovation was the introduction of the “untrusted curator” model, where the entity collecting the data may not be trustworthy. Before users' responses are sent to the curator, the answers are randomized in a controlled manner guaranteeing differential privacy while allowing valid population-wide statistical inferences.

Applications[edit]

The relevance of the ever-growing era of big data calls for high demand of artificial intelligence services at the edges of privacy protection for its users. These services help preserve data privacy in ways that has pushed the research on novel machine learning paradigms that fit their requirements.

Anomaly Detection[edit]

Anomaly detection, is formally defined as the process of identifying unexpected items or events in data sets, which differ from the norm.[8] The prominence of social networking in the current era has led to many hidden potential concerns, primarily those related to information privacy. As more and more users rely on the social networks, for more than merely interactions and self-representation, even going beyond to store personal information, the risks for exposure become prominent. Users are often threatened by privacy breaches, unauthorized access to personal information, and leakage of sensitive data. To attempt to solve this issue, the authors of "Anomaly Detection over Differential Preserved Privacy in Online Social Networks" have proposed a model using a social network utilizing restricted local differential privacy. By using this model, it aims for improved privacy preservation through anomaly detection is analyzed. In this paper, the authors propose a privacy preserving model that sanitizes the collection of user information from a social network utilizing restricted local differential privacy (LDP) to save synthetic copies of collected data. This model uses reconstructed data to classify user activity and detect abnormal network behavior. The experimental results demonstrate that the proposed method achieves high data utility on the basis of improved privacy preservation. Furthermore, local differential privacy sanitized data are suitable for use in subsequent analyses such as anomaly detection. Anomaly detection on the proposed method’s reconstructed data achieves a detection accuracy similar to that on the original data.[9]

Blockchain Technology[edit]

The goal of blockchain technology is to allow digital information to be recorded and distributed, but not edited.[10] The introduction of blockchain in its potential to work with local differential privacy received widespread attention because of its decentralized, tamper-proof, and transparent nature. Blockchain works over the principle of distributed, secured, and shared ledger, which is used to record, and track data within a decentralized network, and has successfully replaced certain systems of economic transactions in organizations and has the potential to overtake various industrial business models in future. This is a growing concern as its use in different applications is increasing exponentially, but this increased use has also raised some questions regarding privacy and security of data being stored in it. This is where the proposal to implement differential privacy becomes an option as differential privacy will allow for blockchain to model unique application scenarios to further improve the privacy system of blockchain.[11]

Context-Free Privacy[edit]

Local differential privacy provides context-free privacy even in the absence of a trusted data collector. Local differential privacy is a strong notion of privacy for individual users that often comes at the expense of a significant drop in utility. The classical definition of LDP assumes that all elements in the data domain are equally sensitive. However, in many applications, some symbols are more sensitive than others. A context-aware framework of local differential privacy[12] is proposed by researchers that allows a privacy designer to incorporate the application’s context into the privacy definition. For binary data domains, the research provides a universally optimal privatization scheme and highlight its connections to Warner’s randomized response[13] (RR) and Mangat’s improved response. Motivated by geolocation and web search applications, for k-ary data domains, the researchers consider two special cases of context-aware LDP: block-structured LDP and high-low LDP (the latter is also defined in [14]). The researchers study discrete distribution estimation and provide communication-efficient, sample-optimal schemes and information theoretic lower bounds for both models. In the end, they show that using contextual information can require fewer samples than classical LDP to achieve the same accuracy.

Facial Recognition[edit]

Facial recognition, though convenient, can potentially lead to a leak of biometric features that identify the user

Facial recognition has become more and more popular in the current stages of society. The most up to date smartphones, for example, utilize facial recognition to unlock the users phone as well as authorize the payment with their credit card. Though this is convenient for the users as it is quick and efficient, there are subtle doubts for this system in place. This system is a resource-intensive task that often involves third party users. This results in a gap where the user’s privacy is compromised. Biometric information delivered to untrusted third party servers in an uncontrolled manner can be considered a significant privacy leak as biometrics can be correlated with sensitive data such as healthcare or financial records. In Chimikara's academic article, he proposes a privacy-preserving technique for “controlled information release”, where they disguise an original face image and prevent leakage of the biometric features while identifying a person. He introduces a new privacy-preserving face recognition protocol named PEEP (Privacy using Eigenface Perturbation) that utilizes local differential privacy. PEEP applies perturbation to Eigenfaces utilizing differential privacy and stores only the perturbed data in the third-party servers to run a standard Eigenface recognition algorithm. As a result, the trained model will not be vulnerable to privacy attacks such as membership inference and model memorization attacks.[15] This model provided by Chimikara shows the potential solution of this issue or privacy leaks.

Federated Learning (FL)[edit]

With federated learning coupled with local differential privacy, researchers have found this model to be quite effective to facilitate crowdsourcing applications and provide protection for users' privacy

Federated learning[16] has the ambition to protect data privacy through distributed learning methods that keep the data in its storage. Likewise, differential privacy (DP) attains to improve the protection of data privacy by measuring the privacy loss in the communication among the elements of federated learning. The prospective matching of federated learning and differential privacy to the challenges of data privacy protection has caused the release of several software tools that support their functionalities, but they lack a unified vision of these techniques, and a methodological workflow that supports their usage. In the study sponsored by the Andalusian Research Institute in Data Science and computational Intelligence, they developed a Sherpa.ai FL, 1,2 which is an open-research unified FL and DP framework that aims to foster the research and development of AI services at the edges and to preserve data privacy. The characteristics of FL and DP tested and summarized in the study suggests that they make them good candidates to support AI services at the edges and to preserve data privacy through their finding that by setting the value of for lower values would guarantee higher privacy at the cost of lower accuracy.[17]

Health Data Aggregation[edit]

The rise of technology not only changes the way we work and perform our everyday lives, but also the changes to the health industry is also prominent as a result of the rise of the big data era is emphasized. The rapid growth of the health data scale, the limited storage and computation resources of wireless body area sensor networks is becoming a barrier to the development of the health industry to keep up. Aiming to solve this, the outsourcing of encrypted health data to the cloud has been an appealing strategy. However, there may come potential downsides as do all choices. The data aggregation will become more difficult and more vulnerable to data branches of this sensitive information of the patients of the healthcare industry. In his academic article, "Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees," Hao Ren and his team proposes a privacy enhanced and multifunctional health data aggregation scheme (PMHA-DP) under differential privacy. This aggregation function is designed to protect the aggregated data from cloud servers. The performance and evaluation done in their study shows that the proposal leads to less communication overhead than the existing data aggregation models currently in place.[18]

Internet Connected Vehicles[edit]

The idea of having internet in one's car would only be a dream if this concept was brought up during the last century. However, now most updated vehicles contain this feature for the convenience of the users. Though convenient, this poses yet another threat to the user's privacy. Internet of connected vehicles (IoV) are expected to enable intelligent traffic management, intelligent dynamic information services, intelligent vehicle control, etc. However, vehicles’ data privacy is argued to be a major barrier toward the application and development of IoV, thus causing a wide range of attention. Local differential privacy (LDP) is the relaxed version of the privacy standard, differential privacy, and it can protect users’ data privacy against the untrusted third party in the worst adversarial setting. The computational costs of using LDP is one concern among researchers as it is quite expensive to implement for such a specific model given that the model needs high mobility and short connection times.[19] Furthermore, as the number of vehicles increases, the frequent communication between vehicles and the cloud server incurs unexpected amounts of communication cost. To avoid the privacy threat and reduce the communication cost, researchers propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model.[20]

Phone Blacklisting[edit]

With LDP based systems, it is shown that it can counter the ever-growing population of spam calls while protecting users' privacy.

The topic of spam phone calls have been increasingly relevant, and though it has been a growing nuisance to the current digital world, researchers have been looking at potential solutions in minimizing this issue. To counter this increasingly successful attack vector, federal agencies such as the US Federal Trade Commission (FTC) have been working with telephone carriers to design systems for blocking robocalls. Furthermore, a number of commercial and smartphone apps that promise to block spam phone calls have been created, but they come with a subtle cost. The user’s privacy information that comes with giving the app the access to block spam calls may be leaked without the user’s consent or knowledge of it even occurring. In the study,[21] the researchers analyze the challenges and trade-offs related to using local differential privacy, evaluate the LDP-based system on real-world user-reported call records collected by the FTC, and show that it is possible to learn a phone blacklist using a reasonable overall privacy budget and at the same time preserve users’ privacy while maintaining utility for the learned blacklist.

Trajectory Cross-Correlation Constraint[edit]

Aiming to solve the problem of low data utilization and privacy protection, a personalized differential privacy protection method based on cross-correlation constraints is proposed by researcher Hu. By protecting sensitive location points on the trajectory and the sensitive points, this extended differential privacy protection model combines the sensitivity of the user’s trajectory location and user privacy protection requirements and privacy budget. Using autocorrelation Laplace transform, specific white noise is transformed into noise that is related to the user's real trajectory sequence in both time and space. This noise data is used to find the cross-correlation constraint mechanics of the trajectory sequence in the model. By proposing this model, the researcher Hu's personalized differential privacy protection method is broken down and addresses the issue of adding independent and uncorrelated noise and the same degree of scrambling results in low privacy protection and poor data availability.[22]

ε-local differential privacy[edit]

Definition of ε-local differential privacy[edit]

Let ε be a positive real number and be a randomized algorithm that takes a user's private data as input. Let denote the image of . The algorithm is said to provide -local differential privacy if, for all pairs of user's possible private data and and all subsets of :

where the probability is taken over the randomness used by the algorithm.

The main difference between this definition and the standard definition of differential privacy is that in differential privacy the probabilities are of the outputs of an algorithm that takes all users' data and here it is on an algorithm that takes a single user's data.

Sometimes the definition takes an algorithm that categorizes all users data as input, and outputs a collection of all responses (such as the definition in Raef Bassily, Kobbi Nissim, Uri Stemmer and Abhradeep Guha Thakurta's 2017 paper [23]).

Deployment[edit]

Local differential privacy has been deployed in several internet companies:

  • RAPPOR,[24] where Google used local differential privacy to collect data from users, like other running processes and Chrome home pages
  • Private Count Mean Sketch (and variances)[25] where Apple used local differential privacy to collect emoji usage data, word usage and other information from iPhone users

References[edit]

  1. ^ "Local vs. global differential privacy - Ted is writing things". desfontain.es. Retrieved 2020-02-10.
  2. ^ "Differential privacy", Wikipedia, 2021-01-26, retrieved 2021-04-03
  3. ^ Joseph, Matthew; Roth, Aaron; Ullman, Jonathan; Waggoner, Bo (2018-11-19). "Local Differential Privacy for Evolving Data". arXiv:1802.07128 [cs.LG].
  4. ^ Wang, Teng; Zhang, Xuefeng; Feng, Jingyu; Yang, Xinyu (2020-12-08). "A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis". Sensors (Basel, Switzerland). 20 (24): 7030. arXiv:2010.05253. Bibcode:2020Senso..20.7030W. doi:10.3390/s20247030. ISSN 1424-8220. PMC 7763193. PMID 33302517.
  5. ^ Evfimievski, Alexandre V.; Gehrke, Johannes; Srikant, Ramakrishnan (June 9–12, 2003). "Limiting privacy breaches in privacy preserving data mining". Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 211–222. doi:10.1145/773153.773174. ISBN 1581136706. S2CID 2379506.
  6. ^ Kasiviswanathan, Shiva Prasad; Lee, Homin K.; Nissim, Kobbi; Raskhodnikova, Sofya; Smith, Adam D. (2008). "What Can We Learn Privately?". 2008 49th Annual IEEE Symposium on Foundations of Computer Science. pp. 531–540. arXiv:0803.0924. doi:10.1109/FOCS.2008.27. ISBN 978-0-7695-3436-7.
  7. ^ Warner, Stanley L. (1965). "Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias". Journal of the American Statistical Association. 60 (309): 63–69. doi:10.1080/01621459.1965.10480775. PMID 12261830.
  8. ^ "Medium". Medium. 2 July 2019. Retrieved 2021-04-10.
  9. ^ Aljably, Randa; Tian, Yuan; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah (2019-04-25). "Anomaly detection over differential preserved privacy in online social networks". PLOS ONE. 14 (4): e0215856. Bibcode:2019PLoSO..1415856A. doi:10.1371/journal.pone.0215856. ISSN 1932-6203. PMC 6483223. PMID 31022238.
  10. ^ following, Full Bio Luke Conway has been; Cryptocurrency, Researching the; blockchain, fintech space for over five years Luke is an expert on; Conway, cryptocurrency Learn about our editorial policies Luke. "Blockchain Explained". Investopedia. Retrieved 2021-04-10.
  11. ^ Ul Hassan, Muneeb; Rehmani, Mubashir Husain; Chen, Jinjun (2020-11-01). "Differential privacy in blockchain technology: A futuristic approach". Journal of Parallel and Distributed Computing. 145: 50–74. arXiv:1910.04316. doi:10.1016/j.jpdc.2020.06.003. ISSN 0743-7315. S2CID 204008404.
  12. ^ Acharya, Jayadev; Bonawitz, Kallista; Kairouz, Peter; Ramage, Daniel; Sun, Ziteng (2020-11-21). "Context Aware Local Differential Privacy". International Conference on Machine Learning. PMLR: 52–62. arXiv:1911.00038.
  13. ^ Kim, Jong-Min; Warde, William D. (2004-02-15). "A stratified Warner's randomized response model". Journal of Statistical Planning and Inference. 120 (1–2): 155–165. doi:10.1016/S0378-3758(02)00500-1. ISSN 0378-3758.
  14. ^ Murakami, Takao; Kawamoto, Yusuke (2019). "Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation" (PDF). Proceedings of the 28th USENIX Security Symposium: 1877–1894. arXiv:1807.11317.
  15. ^ Chamikara, M.A.P.; Bertok, P.; Khalil, I.; Liu, D.; Camtepe, S. (2020-10-01). "Privacy Preserving Face Recognition Utilizing Differential Privacy". Computers & Security. 97: 101951. arXiv:2005.10486. doi:10.1016/j.cose.2020.101951. ISSN 0167-4048. S2CID 218763393.
  16. ^ "Federated learning", Wikipedia, 2021-04-01, retrieved 2021-04-03
  17. ^ Rodríguez-Barroso, Nuria; Stipcich, Goran; Jiménez-López, Daniel; Antonio Ruiz-Millán, José; Martínez-Cámara, Eugenio; González-Seco, Gerardo; Luzón, M. Victoria; Veganzones, Miguel Ángel; Herrera, Francisco (2020). "Federated Learning and Differential Privacy: Software Tools Analysis, the Sherpa.ai FL Framework and Methodological Guidelines for Preserving Data Privacy" (PDF). Information Fusion. 64: 270–92. arXiv:2007.00914. doi:10.1016/j.inffus.2020.07.009. S2CID 220302072.
  18. ^ Ren, Hao; Li, Hongwei; Liang, Xiaohui; He, Shibo; Dai, Yuanshun; Zhao, Lian (2016-09-10). "Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees". Sensors (Basel, Switzerland). 16 (9): 1463. Bibcode:2016Senso..16.1463R. doi:10.3390/s16091463. ISSN 1424-8220. PMC 5038741. PMID 27626417.
  19. ^ Zhao, Ping; Zhang, Guanglin; Wan, Shaohua; Liu, Gaoyang; Umer, Tariq (2020-11-01). "A survey of local differential privacy for securing internet of vehicles". The Journal of Supercomputing. 76 (11): 8391–8412. doi:10.1007/s11227-019-03104-0. S2CID 208869853.
  20. ^ Zhao, Yang; Zhao, Jun; Yang, Mengmeng; Wang, Teng; Wang, Ning; Lyu, Lingjuan; Niyato, Dusit; Lam, Kwok-Yan (2020-11-10). "Local Differential Privacy based Federated Learning for Internet of Things". IEEE Internet of Things Journal. PP (11): 8836–8853. arXiv:2004.08856. doi:10.1109/JIOT.2020.3037194. S2CID 215828540.
  21. ^ Ucci, Daniele; Perdisci, Roberto; Lee, Jaewoo; Ahamad, Mustaque (2020-06-01). "Towards a Practical Differentially Private Collaborative Phone Blacklisting System". Annual Computer Security Applications Conference. pp. 100–115. arXiv:2006.09287. doi:10.1145/3427228.3427239. ISBN 9781450388580. S2CID 227911367.
  22. ^ Hu, Zhaowei; Yang, Jing (2020-08-12). "Differential privacy protection method based on published trajectory cross-correlation constraint". PLOS ONE. 15 (8): e0237158. Bibcode:2020PLoSO..1537158H. doi:10.1371/journal.pone.0237158. ISSN 1932-6203. PMC 7423147. PMID 32785242.
  23. ^ Bassily, Raef; Nissim, Kobbi; Stemmer, Uri; Thakurta, Abhradeep Guha (2017). "Privacy Aware Learning". Practical Locally Private Heavy Hitters. Advances in Neural Information Processing Systems. 30. pp. 2288–2296. arXiv:1707.04982. Bibcode:2017arXiv170704982B.
  24. ^ Erlingsson, Úlfar; Pihur, Vasyl; Korolova, Aleksandra (2014). "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response". arXiv:1407.6981. Bibcode:2014arXiv1407.6981E. doi:10.1145/2660267.2660348. S2CID 6855746. Cite journal requires |journal= (help)
  25. ^ "Learning with Privacy at Scale". 2017. Cite journal requires |journal= (help)