Privacy for research participants
Privacy for research participants is a concept in research ethics which states that a person in human subject research has a right to privacy when participating in research. Some examples of typical scenarios are that a surveyor doing social research conducts an interview where the research participant is a respondent, or when a researcher for a clinical trial asks for a blood sample from a participant to see if there is a relationship between something which can be measured in blood and a person's health. In both cases, the ideal outcome is that any participant can join the study and neither the researcher nor the study design nor the publication of the study results would ever identify any participant in the study.
People decide to participate in research for any number of different reasons, such as a personal interest, a desire to promote research which benefits their community, or for other reasons. Various guidelines for human subject research protect study participants who choose to participate in research, and the international consensus is that the rights of people who participate in studies are best protected when the study participant can trust that researchers will not connect the identities of study participants with their input into the study.
Many study participants have experienced problems when their privacy was not upheld after participating in research. Sometimes privacy is not kept because of insufficient study protection, but also sometimes it is because of unanticipated problems with the study design which inadvertently compromise privacy. The privacy of research participants is typically protected by the research organizer, but the institutional review board is a designated overseer which monitors the organizer to provide protection to study participants.
Researchers publish data that they get from participants. To preserve participants' privacy, the data goes through a process to de-identify it. The goal of such a process would be to remove protected health information which could be used to connect a study participant to their contribution to a research project.
A privacy attack is the exploitation of an opportunity of someone to identify a study participant based on public research data. The way that this might work is that researchers collect data, including confidential identifying data, from study participants. This produces an identified dataset. Before the data is sent for research processing, it is "de-identified", which means that personally identifying data is removed from the dataset. Ideally this means that the dataset alone could not be used to identify a participant.
In some cases the researchers simply misjudge the information in a de-identified dataset and actually it is identifying, or perhaps the advent of new technology makes the data identifying. In other cases the published de-identified data can be cross-referenced with other data sets, and by finding matches between an identified dataset and the de-identified data set, participants in the de-identified set may be revealed.
The ideal situation from the research perspective is the free sharing of data. Since privacy for research participants is a priority, though, various proposals for protecting participants have been made for different purposes. Replacing the real data with synthetic data allows the researchers to show data which gives a conclusion equivalent to the one drawn by the researchers, but the data may have problems such as being unfit for repurposing for other research. Other strategies include "noise addition" by making random value changes or "data swapping" by exchanging values across entries. Still another approach is to separate the identifiable variables in the data from the rest, aggregate the identifiable variables and reattach them with the rest of the data. This principle has been used successfully in creating maps of diabetes in Australia  and the United Kingdom  using confidential General Practice clinic data.
A biobank is a place where human biological specimens are kept for research, and often where genomics data is paired with phenotype data and personally-identifying data. For many reasons, biobank research has created new controversies, perspectives, and challenges for satisfying the rights of student participants and the needs of the researchers to access resources for their work.
One problem is that if even a small percentage of genetic information is available, that information can be used to uniquely identify the individual from which it came. Studies have shown that a determination of whether an individual participated in a study can be made even from reporting of aggregate data.
When research participants have their identities revealed they may have problems. Concerns include facing genetic discrimination from an insurance company or employer. Respondents in the United States have expressed a desire to have their research data to be restricted from access by law enforcement agencies, and would want to prevent a connection between study participation and legal consequences of the same. Another fear study participants have is about the research revealing private personal practices which a person may not want to discuss, such as a medical history which includes a sexually transmitted disease, substance abuse, psychiatric treatment, or an elective abortion. In the case of genomic studies on families, genetic screening may reveal that paternity is different from what had been supposed. For no particular reason, some people may find that if their private information becomes disclosed because of research participation, they may feel invaded and find the entire system distasteful.
- Netflix Prize - researchers release a database with approximate years of birth, zip codes, and movie-watching preferences. Other researchers say that based even on this limited information, many people can be identified and their movie preferences could be discovered. People objected to having their movie-watching habits become publicly known.
- Tearoom Trade - a university researcher published information revealing persons who engaged in illicit sex, and research participants did not consent to be identified.
- Malin, B.; Loukides, G.; Benitez, K.; Clayton, E. W. (2011). "Identifiability in biobanks: Models, measures, and mitigation strategies". Human Genetics 130 (3): 383–392. doi:10.1007/s00439-011-1042-5. PMID 21739176.
- Mazumdar S, Konings P, Hewett M et al. (2014). "Protecting the privacy of individual general practice patient electronic records for geospatial epidemiology research". Australian and New Zealand Journal of Public Health 38 (6): 548–552. doi:10.1111/1753-6405.12262. PMID 25308525. http://onlinelibrary.wiley.com/doi/10.1111/1753-6405.12262/full
- Nobel D,Smith D, Mathur R et al. (2012). "Feasibility study of geospatial mapping of chronic disease risk to inform public health commissioning". BMJ Open 2. doi:10.1136/bmjopen-2011-000711. PMID 22337817. http://bmjopen.bmj.com/content/2/1/e000711.full
- Clayton, D. (2010). "On inferring presence of an individual in a mixture: A Bayesian approach". Biostatistics 11 (4): 661–673. doi:10.1093/biostatistics/kxq035. PMC 2950790. PMID 20522729.
- Lemke, A. A.; Wolf, W. A.; Hebert-Beirne, J.; Smith, M. E. (2010). "Public and Biobank Participant Attitudes toward Genetic Research Participation and Data Sharing". Public Health Genomics 13 (6): 368–377. doi:10.1159/000276767. PMC 2951726. PMID 20805700.
- Kaufman, D. J.; Murphy-Bollinger, J.; Scott, J.; Hudson, K. L. (2009). "Public Opinion about the Importance of Privacy in Biobank Research". The American Journal of Human Genetics 85 (5): 643. doi:10.1016/j.ajhg.2009.10.002.
- Greely, H. T. (2007). "The Uneasy Ethical and Legal Underpinnings of Large-Scale Genomic Biobanks". Annual Review of Genomics and Human Genetics 8: 343–364. doi:10.1146/annurev.genom.7.080505.115721. PMID 17550341.
- United States federal policy on privacy
- Guidelines at Columbia University
- Podcast and transcript from Scientific American interview with Latanya Sweeney