Jump to content

Wikipedia:Reference desk/Archives/Computing/2021 August 22

From Wikipedia, the free encyclopedia
Computing desk
< August 21 << Jul | August | Sep >> August 23 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 22[edit]

For Security Experts: Potential to de-anonymize Clinical Trials Data?[edit]

About two years ago I either read or heard about some research where people had taken clinical trials data and by utilizing information from social media like Facebook were able to deduce the actual people who corresponded to anonymous individuals in a clinical trial. A colleague of mine also made the point that he has a rare blood type so with knowledge about his blood type, location, and ethnicity it would be pretty easy to determine that patient X was him. I'm currently writing a paper about Semantic AI technology and applications to the Covid-19 pandemic. The one topic that has come up in virtually every paper that tried to actually get their research utilized in the real world was privacy. In my conclusion I want to talk about privacy and I would like to cite this article but I can't find it. I've tried various searches but while I'm usually pretty good at figuring out the right keywords no luck. Anyone have any ideas? --MadScientistX11 (talk) 02:20, 22 August 2021 (UTC)[reply]

Perhaps this? – although not involving clinical trial data but hospital visit data. There was a similar incident when AOL made a large data set of "anonymized" queries available.[1][2]  --Lambiam 11:26, 22 August 2021 (UTC)[reply]
@MadScientistX11: Check out the article Data_re-identification, it has several examples of deanonyizing, and some references there you can dig into for more info. RudolfRed (talk) 20:28, 22 August 2021 (UTC)[reply]
Yes, deanonymization is a thing. There is a topic called differential privacy that tries to help with it, but who knows. I can understand an ethical tension when it comes to something like vaccine performance, but for stuff like TV viewing habits it's better to just prevent anyone from getting the data in the first place, rather than worry about the best way to anonymize it. Keep in mind that even totally anonymized data can still cause damage if used by the wrong people. Politicians including bad ones pay lots of money for opinion polls, so they can use that anonymized info for their own nefarious purposes. So data merely being non-personally-identifiable doesn't mean it shouldn't be kept confidential. 2601:648:8202:350:0:0:0:2B99 (talk) 22:30, 23 August 2021 (UTC)[reply]
Thanks. I don't think any of those were the specific article I read but that is the information I needed. thanks a lot. --MadScientistX11 (talk) 03:44, 24 August 2021 (UTC)[reply]