Hacker News new | past | comments | ask | show | jobs | submit login
Google AI has access to huge haul of NHS patient data (newscientist.com)
124 points by AlexandrB on May 2, 2016 | hide | past | favorite | 77 comments



Why are people astonished?

For more than a decade researchers have been able to get access to large deidentified datasets. As a PhD student I have access to data on 150 Million visits by 45 Million patients. In some ways the data and access I have is superior to that of Google & NHS (since UK is a tiny country in comparison to USA), and I am just a PhD student. Though I have been working on it for last 5 years.

Recently there is a new qualified entity program run by CMS which provides access to Medicare data.

You can read more about my research and see the demo of the system in my past submissions and at

http://www.computationalhealthcare.com

Also for the actual govenment program http://www.ahrq.gov/research/data/index.html

Actually when it comes to Medical information its a much much more complex problem legally. This paper gives a good overview on issue of "Patient ownership of data" http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1857986

I am by no means minimizing the concenrs that people have. But I think the article paints Google in negative light while ignoring current standard practices. I wish the discussion would be more rooted in the facts about how such data sharing systems work currently.


> Why are people astonished?

Because politicians keep making promises in regards to privacy that they don't keep in regards to "confidential" information.

Similarly, most of the medical value from such information [e.g. Frequency of X within a given population fitting certain characteristics] would likely deanonymize people.


No politician was involved in this decision.


http://www.bbc.com/news/uk-16021240

> "All necessary safeguards would be in place to ensure protection of patients' details - the data will be anonymised and the process will be carefully and robustly regulated.

> "Proper regulation and essential safeguards need to be in place when it comes to patients data," he said. "It cannot be done in a way where essential rules are threatened."

The legality of these data sharing laws start off with public promises of anonymization, robust regulation, safeguards, and privacy.

https://www.england.nhs.uk/2014/01/geraint-lewis/

> Amber data are where we remove each patient’s identifiers (their date of birth, postcode, and so on) and replace them with a meaningless pseudonym that bears no relationship to their “real world” identity. Amber data are essential for tracking how individuals interact with the different parts of the NHS and social care over time. For example, using amber data we can see how the NHS cares for cohorts of patients who are admitted repeatedly to hospital but who seldom visit their GP. In theory, a determined analyst could attempt to re-identify individuals within amber data by linking them to other data sets. For this reason, we never publish amber data. Instead, amber data are only made available under a legal contract to approved analysts for approved purposes. The contract stipulates how the data must be stored and protected, and how the data must be destroyed afterwards. Any attempt to re-identify an individual is strictly prohibited and there is a range of criminal and civil penalties for any infringements.

The problem with psuedonymous data is the NHS basically admits it can be used to identify people given sufficient effort.

---

That is why people are "astonished" by these decisions. The politician provides the initial promises that imply anonymity, the implementation doesn't provide true anonymity but provides criminal penalties for pulling off the mask, and then the data is handed to enough 3rd parties if such data is leaked its likely impossible to know by whom unless the data was tampered with to provide a per-contract identifier.

I understand this specific decision did not involve a politician but the conversation was why people are surprised. How many people do you think really know the anonymity originally promised became a permeable pseudonym?


You've linked to a page about care.data

This Google thing has fuck all to do with care.data - they're totally separate.

I understand "the" NHS is complex, but it's pretty frustrating talking to someone who has very strong opinions and who clearly doesn't know what they're talking about.

It's really weird to link to a document that talks about the severe legal penalties for anyone who attempts to de-anonymise the data, and then use that to say "look how flimsy these agreements are!", especially when the document you link to has a BOLD lead saying that things are even stricter in the newer document.

> The politician provides the initial promises

Again, not a politician. Chief data officer at NHS England, and a real doctor. http://www.nuffieldtrust.org.uk/about/our-people/dr-geraint-...

> The problem with psuedonymous data is the NHS basically admits it can be used to identify people given sufficient effort.

It's trivially easy for Google to do this already without the NHS data, and they don't face prison time for doing it. See all the pregnant teens outed by supermarket loyalty cards for other examples.


> It's trivially easy for Google to do this already without the NHS data, and they don't face prison time for doing it. See all the pregnant teens outed by supermarket loyalty cards for other examples.

And how many members of the general population do you think are aware of this?

> I understand "the" NHS is complex, but it's pretty frustrating talking to someone who has very strong opinions and who clearly doesn't know what they're talking about.

It probably has something to do with the fact you are completely missing the point I'm discussing rather than the strength of my opinions.


>> The problem with psuedonymous data is the NHS basically admits it can be used to identify people given sufficient effort.

It seems you have disagreement with the system adopted by NHS and several others worldwide. This has nothing to do with Google or Politicians. The system emerged from decades of research and understanding of compromise between the need to protect privacy and advancement of medical research. You are free to suggest alternatives over current system. I have studied this problem for last five years and there isn't a simple solution.


1) I have a disagreement with bait and switch data privacy laws that are publicly sold as anonymous when everyone can tell from the implementation details that they are not.

2) You asked why they were astonished. Well, #1 is why. Most people don't have the time/energy/desire to study the implementation details on every facet of their lives.


I think this situation nicely juxtaposes positive externality with private rights.

As an individual, you may wish to hoard all of your personal medical information. Doing so may provide marginal benefit or protection against some theoretical harms. However, much of medical science relies on large population studies. Is knee surgery worth it? Are breast exams? Do COX-2 inhibitors increase heart attacks? Enormous amounts of good could be done by having large datasets of entire patient histories available for analysis by all, assuming they could not be de-anonymized (if not, then the datasets have to be analyzed under contracts)

Whenever I visit the doctor and I am given a form asking if my data can be sent to some group to participate in a study, I always answer yes. Never once have I had any negative repercussions from doing so, and it is my hope that the data was used to publish scientific papers that added to net knowledge of humanity. However, I have to wonder if asking me for permission actually interferes with random sampling. Are people who give permission vs people who refuse statistically more likely to have other behaviors that may influence the results?


> assuming they could not be de-anonymized

That's where the problem is anonymizing data is really difficult and it takes only a few bits of info to associate that data with a name. It takes only 3 pieces of information to uniquely identify 87% of people, Zip code, birthday, and sex.

http://arstechnica.com/tech-policy/2009/09/your-secrets-live...

http://dataprivacylab.org/projects/identifiability/paper1.pd...


On the other hand anonymization destroys useful data (along with already poor record systems.) There was a Kaggle competition with medical data that had been so thoroughly damaged that 5 year old boys were listed as pregnant.


Right, which justifies the government signing closed source deals to analyze this data, with the threat of legal action for violations. They can stipulate that the results of such analysis should be made public.


It's difficult - I have more of a fairness problem with this (such a deal would probably not be available to researchers or smaller companies) than a data privacy concern. As others point out downstream, I think it's not really in Google's interests to use the data maliciously and I have more faith in their opsec than that of any NHS trust.


Whenever I visit the doctor and I am given a form asking if my data can be sent to some group to participate in a study, I always answer yes. Never once have I had any negative repercussions from doing so

And that's your choice. Unfortunately, someone else committed suicide because they didn't ask for the help they needed for their mental health issues because of justified concern that their condition would leak and it would lead to discrimination.

There are many reasons privacy is important. Among the most important is the ability to have completely open discussions with clinical professionals about any medical issues you have, without having to worry that anyone has any ulterior motives or that anything you say could come back to haunt you later.

It is difficult enough for some people to talk about some conditions even with their own doctor in the privacy of the doctor's office. Probably some valuable benefits would result if we dropped privacy altogether and allowed widespread analysis of clinical data. However, it is an absolute certainty that serious damage is done both when patient confidentiality is perceived to be threatened (through patients not raising issues in the first place) and when patient confidentiality is in fact compromised (through discrimination of various kinds, much of which may be based on misconceptions).


It's an interesting question for sure. There are issues of consent which, especially for poor or marginalized people, should be addressed. Like Henrietta Lacks...

I think in this case and others like it the family, at minimum, has a right to know. "...it was not until 1973, when a scientist called to ask for blood samples to study the genes her children had inherited from her, that Ms. Lacks’s family learned that their mother’s cells were, in effect, scattered across the planet. Some members of the family tried to find more information. Some wanted a portion of the profits that companies were earning from research on HeLa cells. They were largely ignored for years."

http://www.nytimes.com/2013/08/08/science/after-decades-of-r...


> Whenever I visit the doctor and I am given a form asking if my data can be sent to some group to participate in a study, I always answer yes. Never once have I had any negative repercussions from doing so, and it is my hope that the data was used to publish scientific papers that added to net knowledge of humanity.

I've grown cynical. My first thought on seeing this kind of form is that a third party will be making money from the information I provide somehow - whether it's through patented drugs or treatments or through pay journals like those owned by Elsevier. Neither I, nor anyone I know, will get to benefit from from the information I provide without spending money (often exorbitant amounts of it). This is the flip side of privatizing everything. Why should I lift a finger to help for-profit entities when they will not do the same to help me?


It's going to sound like I'm telling you how to live your life and how to feel and I really don't mean it that way but I don't know any other way to express my viewpoint so I'll just power through it:

The idea that someone, somewhere is making a profit on something you provided should make you happy, not sad. To make a profit, they had to be paid, and to be paid they normally have to have provided something that someone wanted enough to part with cash. That cash changed hands is not the main point of the transaction (it's a net zero for society, someone gained cash and someone lost it), the main point is rather that something of value was created.

You threw out the example of pay journals like Elsevier. It is true that for cases like that, there may be a market inefficiency that means not much value is created. But I think those are edge cases and relatively rare. Even if our worst fears about Elsevier and other rent seeking journals are true, it would still be the case that you're helping rather than hurting by providing anonymized personal data.


> The idea that someone, somewhere is making a profit on something you provided [for free] should make you happy, not sad.

This seems like a double standard. Why should I be happy when corporations will regularly go to court to make sure that nothing they create returns to the public domain within my lifetime? The norm seems to be capturing this kind value wherever (and for a long as) possible, not distributing it freely for the good of mankind.

I'm not making a normative argument that this is a healthy way to live or for society to operate. In a functional community resources, ideas, and capabilities may be shared freely for the benefit of all, but it can't always go one way. That's exploitation, not community.


Ok, so what? Even if a company fights to maintain control over their product, there's still a societal benefit even if that benefit has been artificially limited due to price. And while they can delay things to an extent, eventually, they'll lose control over their product as IP protections run out. There are still games that can be played at that point when everything is lined up just right, but those are games that can be addressed through legislation and the courts.

I'd rather hideously expensive treatments exist, even if I can't afford them, than live in a world where they're never even an option. Price can and does change over time, and those treatments become more accessible to more people. But if the treatments are never developed in the first place, then that's an even greater tragedy because then there's literally nothing that can be done to make them accessible.


It’s more like "you sold something under price".

You might have sold your own data for a higher price if you had known.


Value to a private entity != value to me, society or entities that matter to either.

Especially when that private entity gains more power through my data and tries everything from questionable business practices to lobbying to make sure they, and only they, reap the benefits.

People value chemical weapons and forced labor as well. They're willing to pay a lot for them, too.


> Why should I lift a finger to help for-profit entities when they will not do the same to help me?

But if releasing your data leads to research that can save lives, isn't that worth doing even if someone gets rich in the process? I'd much rather there exist a million-dollar treatment that can save my life at the cost of living in debt for the next twenty years, rather than die in six months because the research was never done.

(Yes, this is a false dichotomy - there should be a better way of administering treatments and doing medical research.)


You--as an individual--receive zero benefits from your medical data when it's kept private. You can't consume it, you can't use it, and you can't trade it. It has no inherent value on its own. Were researchers forced to pay you for access to your de-identified data, it'd be a number so infinitesimal as to be all but meaningless.

You aren't giving up something of value to you. There's zero opportunity cost involved. The value is added after your data is collected, collated, analyzed, and used by researchers. Individually, it's meaningless. Together, it's priceless.

Will someone eventually profit? Sure, but they'll do so because society benefits from the work they've done. And eventually, the cost will decrease. Would you rather have drug treatments that exist in the first place even if someone profits from them or nothing? Those are pretty much your options. For all of its flaws, the current research model at least works. The only other option, besides doing nothing and letting people die from diseases we could eventually treat, is some sort of public financing and that comes with a host of problems. Would anyone really be so foolish as to want to subject medical research to the congressional budget process, even if there's an abstraction layer between congress and the researchers? That means tradeoffs, and lots of them: for instance, Drug option A would be pursued, and option B ignored because we're already funding A even if B might be more promising later on.

We already see it with federal applied research when critical work is jeopardized for the sake of political grandstanding. Every so often, someone will trot out a cherry-picked grant they don't like and wave it around like a red flag in front of a bull. Drug companies might be bad enough, but the politicians would be worse.


You are correct there are already laws and systems in place. E.g. here is an interesting paper in New England Journal of Medicine using California data. Note: Not published by me but someone who I work with.

http://www.nejm.org/doi/full/10.1056/NEJMoa1311485


Nice try.

If this was about positive externalities, Google tech could have found its way into the NHS.

Instead, as always and every time, data found a way into Google.

Funny how that works.


Terrible reporting. Improving health outcomes by AI analysis of patient data is a much bigger prize - morally and commercially - than anything which could be achieved through ad targeting. Google is far too smart to squander such an opportunity by abusing patients' trust.


The big problem is the precedent it sets for data access.

What are the criteria for who gets access? What are the constraints of that access?

This story covers the latter being blown apart, the constraints were poorly defined and implemented and thus even if the criteria is well defined access to far more data was made possible.

I'm sure that few patients desire an end to research, or would argue that such access isn't a good thing... but what of the insurance industry? Should they have access? Would the NHS be able to define and enforce those constraints?

Perhaps that's an obvious no.

What then of an insurer partnering with a medical research company, from the viewpoint of "This costs insurance a lot of money, we'd like to fund a way to reduce that financial exposure".

The grey areas emerge immediately.

If we cannot control access to patient data, data that would be trivial to either strip anonymity or just to have in aggregate enough to still produce net-negatives (i.e. correlated by post code would reveal enough with little extra work)... and if we cannot define and enforce the constraints of access... then we really shouldn't be sharing what is highly sensitive and personal information that was originally only disclosed between a patient and a Doctor under the premise that what is shared is covered by the explicit and implicit confidentiality of that conversation.

It's always worth remembering:

Data was acquired under doctor patient confidentiality.

If we considered that data to have a licence, it is the most restrictive licence possible. One could consider what has happened here as a re-licensing without permission. Such an act could have a chilling effect on the relationship between the doctor and patient.


You are making some implicit assumptions that they data access isn't highly controlled.

I have seen a few of these sorts of deals killed because of data access concerns, and/or computation requirements ("you can have access to anonymized data, but you have to run your code in a sandbox on our health servers").

And, this is why we have legislation.


Less implicit, from the originally linked article:

> The scale of the sharing program was apparently misrepresented to the public, originally announced as an app to help hospitals monitor patients with kidney disease with real-time alerts and analytics. But since those patients don't have their own separate dataset, Google has argued it needs access to all patient data from the participating hospitals.

No assumption there, they didn't have a separate dataset and so granted access to all patient data.


"so granted access to all patient data"

Yes, but under what conditions? Many privacy laws apply here, and treating Google as some monolithic entity where everyone working there can now read anyone's personal health history is inaccurate.


Its psuedononymous data the NHS has previously admitted can be deanonymized given sufficient effort but such deanonymization carries criminal and civil penalties.


Nope. To set a precedent it would have to precede. Giving de-identified medical records to researchers is a long-standing, well-established and regulated process. The only interesting thing here is that it's Google and not some PhD's university lab.

Here's HHS on what HIPAA has to say about this: [0]

[0] http://www.hhs.gov/hipaa/for-professionals/privacy/special-t...


It so happens Google has the perfect means at its disposal for de-anonymizing large swaths of such data: trillions of user location records, calendar appointments, emails, and texts. It's not too hard to put all that together to match a specific encounter record, for example.


which would both violate their contract and also be illegal.


And Google would never break the law or breach a contract. Especially a contract they signed with the UK Government.

I mean, other than that time just a few years ago[0] where Google broke the law and then breached the contract they signed with the UK Government.

[0] http://www.bbc.com/news/technology-19014206


I do not trust Google and I am not being given a choice.


not sure if you saw my comment downstream? would encourage you to read original piece for a more nuanced presentation of the information - https://www.newscientist.com/article/2086454-revealed-google...

happy to address criticism


Hi Hal I thought the article should have compared and contrasted with other government run large data sharing programs such as CMS qualified entity or AHRQ HCUP.


thanks for the comment. This would be interesting, but not sure it would have made sense to pack it into one article that is already heavy with data terminology for a lay reader.

Will definitely be looking into healthcare data more, as this story has resulted in some interesting leads


Google is not a monolithic entity. You'd have to trust the individual researchers who have access to the data, and we don't even know who they are. And the NHS didn't ask their patients for permission before handing this data over, so whether you trust them or not is irrelevant, they get your data anyway.


That seems incredibly naive. They utilize every other bit of information they collect, why wouldn't they utilize this data?

Google is a corporation. It can't have good intentions of its own. It's the thousands of employees who will potentially be working with and handling the data that you need to worry about.


Google places the most comprehensive controls on PII of any place I've seen, including hospital environments subject to HIPAA. (Mostly because, unlike the hospitals, they have the technical clue how to enforce it properly. The hospitals... are still learning about computer security, and it's not their forte: http://arstechnica.com/security/2016/03/two-more-healthcare-... )

Getting access to private information in Google is hard - my experience as a researcher here is that there's a strong incentive to find an open-source or non-PII dataset before touching user data. I'll go through my year here without ever touching even the most innocuous PII data.

It's very unlikely to me that thousands of people will have access to this data. It's much more likely that a small handful will, and that they'll be supported by others with no access whatsoever. From the article, in fact:

"The agreement clearly states that Google cannot use the data in any other part of its business. The data itself will be stored in the UK by a third party contracted by Google, not in DeepMind’s offices. DeepMind is also obliged to delete its copy of the data when the agreement expires at the end of September 2017."

From an incentive perspective, the potential value-add of abusing the data is tiny compared to the potential costs and loss of user trust. Google's very aware of how important it is to maintain user trust -- http://www.techradar.com/us/news/internet/google-we-have-a-c...

Corporations don't have brains, but they have cultures, and Google's culture -- composed of those thousands of engineers -- is quite fanatical about protecting user privacy. It's one of the non-technical things that's impressed me most during my time here.

The risk with a company like Google is if the economic winds and culture changes, but that's a long-term process, and is also the reason for legally-binding contracts to do things like delete the data (see above).

tl;dr: Google has the technical means to protect the confidential data better than almost any other agency, including from its own employees. The most important question to ask is whether the NHS structured the data sharing in a way that provides for long-term protection, and (IANAL!) it sounds like it from the article.

Source: I'm a professor who deals with our IRB occasionally, have colleagues doing joint CS-medical research, pushed patients around a hospital in a younger life, and am on sabbatical for the year at Google.


As an expert in machine learning, I don't see how someone would expect this to work. Actually to predict a disease, you need true positives and true negatives. If you only had access to the true positive data, it would be a lot harder to predict accurately.


Seems pretty simple, really.

1000 patients come in with symptoms that look like cancer at year 0.

100 actually get diagnosed with cancer at some point between year 0 and year 5.

Presumably, the remaining 900 didn't have cancer at year 0.


"Seems". You can't know what fraction of your remaining 900 had cancer but weren't diagnosed with it for any of a wide variety of reasons, from death through misadventure, through false negative, to simple loss to followup. Clinical studies are designed to exclude such confounding outcomes. It's very difficult to see how any study of this data could be designed to do likewise.


True, but I think you can mitigate these issues.

Especially for the NHS dataset, since you will either see the patient in there or in a death index (unlike, say, US where they may have just gone to another hospital).

Also, the scope here is more like a longitudinal vaccine study than a clinical trial. 50M people will provide a lot of robustness that you wouldn't see in a 1000 patient trial.


> you will either see the patient in there or in a death index

Or you won't see any new information for the patient, because the patient is lost to followup. Or - worse - you'll see new information, but it'll be invisibly erroneous, because random GPs don't work to the standard that physicians administering examinations in clinical studies do.

> 50M people will provide a lot of robustness that you wouldn't see in a 1000 patient trial

Not if the 1000-patient trial is well designed, and the data of those 50M people is totally uncontrolled and unverified. This isn't warfare - Stalin's dictum has no place here. You can't overcome the flaws of a dirty dataset by adding more dirty data to it, especially when you literally cannot know either the magnitude or the nature of the inaccuracy, or even tell what's accurate from what's not.


>> Or - worse - you'll see new information, but it'll be invisibly erroneous, because random GPs don't work to the standard that physicians administering examinations in clinical studies do.

So you will take this into account and emphasize the more reliable types of tests, like blood tests. Or you'll find ways of learning which doctors who do the tests more accurately(or train doctors to do so) and which people are more consistent/reliable with their relationship with their docs. Or maybe you'll get a few hypotheses which are relatively likely and that would incentivize the researchers/google to do small clinical trials on them.

It's worth a try at least.


Even if it were - which is, for the aforementioned reasons, doubtful - it would not be worth turning over the medical records of 50M people to an unregulated private company without so much as a by-your-leave.


Can you explain what you mean a little more? It seems like by having complete medical records, there would be a great deal you could infer statistically.

Do you mean that it's not sufficient to have a medical record that doesn't indicate knee issues? That you would need a medical record that confirms there are no knee issues?


If you want a reliable result, yes.

Consider: I may have a medical record that doesn't indicate knee issues because I have correctly been diagnosed with no knee issues. Or I may have such a record because I have been incorrectly so diagnosed, despite the presence of knee issues. Or I may have such a record because I haven't been to a doctor about my knee issues in the years since they've developed, and so no opportunity has arisen for my knee issues to be documented. Simply from a medical record reflecting no evidence of knee issues, you cannot know which of these, if any, is true.

As I mentioned in a reply to your neighbor comment, clinical studies are designed to exclude such confounders as these - and that's a considerable part of why such studies are so expensive to design and carry out. It's very difficult to see how such exclusion could be achieved with data of the quality which seems to be involved here.


People in the UK who want to complain about this could try the ICO who strongly regulate health information.

They could also look at any advertising material and report that to ASA (if it meets the criteria for being regulated).

That trust has a Caldicott Guardian who will be responsible - legally - for keeping patient data safe. I would have liked a quote from them, although I guess that quote would be something like "No patient identifiable data has been shared with DeepMind". That would be scary, I have no doubt that DeepMind would do a very good job of de-anonymising data, but I know that Google would have to be monumentally stupid to try that.

There are other data projects happening in the NHS - care.data (that dot isn't a typo!) is one that got a lot of attention. That allowed (after some fuss) people to opt-out. (It didn't allow people to specifically opt in to show their support, which is something I would have done.)

I'm a bit wary of Vice's reporting here. They don't seem to know what they're talking about (there's nothing about controls over patient data in the NHS, for example); they don't seem to have approached the Trust involved; they haven't done a good job of explaining what's going on.

There are some really bad failures or data protection in the NHS (especially around mass email! People using CC instead of BCC to a group of people using an HIV clinic, for example) and there are some historic abuses (selling data to insurance companies) that led to changes in the law.

So I don't know if this is terrible and deserving of anger, or okay and poorly reported, or a good thing with misleading reporting.


The National Health Service (NHS) is the publicly funded healthcare system for England (source: Wikipedia)

Putting that here because I was confused about what NHS was in the first place (I'm French).


There isn't really a single NHS - there are four National Health Services for the four parts of the UK (England, Scotland, Wales and Northern Ireland):

https://en.wikipedia.org/wiki/National_Health_Service


The world's 5th largest employer (just after McDonald's)..


best part of this is when we reach the skynet robot uprising they'll know exactly how to kill us the fastest


There is no equivalent of HIPPA in the UK?


HIPPA stands for Health Insurance Portability and Accountability Act. The "Privacy" component of HIPPA is very limited in scope. Even with HIPPA the data being shared would be consistent with definition of "Limited" dataset.[0] At least in United States there is significant precedent for sharing such data, including even for marketing purposes as decided by Supreme Court case (IMS Health vs Sorrell).

[0]http://www.hopkinsmedicine.org/institutional_review_board/hi...


[flagged]


> will get to google

The data "got to" Google when it was sent to Google DeepMind.

> anonymised

I expect a HN to understand how this is usually impossible. Correlating data may seem hard to the layman, but Google is experience at data analysis.

Arguing that Google's AI division - who are trying to find subtle correlations in the medical data - somehow cannot correlate those same records back to their "personally identifiable information" laughable.

> use it to

That's not the only reason to not want your medical data spread around. A better reason might be not wanting your data in more places where it could be stolen or leaked.

edit:

> pathetic

The "pathetic" part is the complete disregard for the people that generated that data. It doesn't matter if Google is talented at what they do, or if the data might be used for something that you might think is useful. I may even agree about the utility of the data, but if you aren't getting explicit informed consent from the people involved, you have a serious ethics problem.


> The data "got to" Google when it was sent to Google DeepMind.

it did not. google own deepmind, but the restrictions on data usage are pretty tight.

> I expect a HN to understand how this is usually impossible. Correlating data may seem hard to the layman, but Google is experience at data analysis.

yes, but then again, i don't expect you to know what data it is that deepmind are working with, and what their primary motives are. whilst i don't know for certain, i heavily doubt the data is leaving deepmind.

> who are trying to find subtle correlations in the medical data

that is not what they are doing.

> That's not the only reason to not want your medical data spread around. A better reason might be not wanting your data in more places where it could be stolen or leaked.

oh please. i trust google far more than my own government for data protection; we're talking about an orginisation who leaves sensitive literally just lying around (http://news.bbc.co.uk/2/hi/uk/7449927.stm).

> The "pathetic" part is the complete disregard for the people that generated that data. It doesn't matter if Google is talented at what they do, or if the data might be used for something that you might think is useful. I may even agree about the utility of the data, but if you aren't getting explicit informed consent from the people involved, you have a serious ethics problem.

an aside; it's funny that people are far more concerned about their medical data than they are their personal history. my medical data doesn't define me, but my actions do. if only such zeal existed for governments indiscriminate collection of personal data. oh well.

but back on topic, the data deepmind has (and i want to make that distinction that google does not have it, because they do not). it is anonymised sufficiently such that short of heavy analysis (which i concede is probably possible) very few inferences about the identity of the patient can be drawn. further, deepmind has absolutely no interest in correlating this data with people, and less than zero interest in targeting adverts to patients.


>oh please. i trust google far more than my own government for data protection; we're talking about an orginisation who leaves sensitive literally just lying around

We're talking about a corporation that had employees access personal data to show off to teenagers[0].

No one can be trusted with data, therefore it needs to be restricted to only those who need access. The NHS probably needs data about patients, Google does not and the unspecified third-party contractor Google has employed to store the data definitely does not.

[0] http://techcrunch.com/2010/09/14/google-engineer-spying-fire...


> it did not. google own deepmind

DeepMind IS Google. Asserting that it isn't doesn't magically make them a separate legal entity.

> restrictions on data usage

I really don't care what the restrictions are today. The data exists under Google's control.

> i don't expect you to know

Arrogant insults are not a useful form of rhetoric. I read the actual agreement, and while it wasn't anywhere as detailed as I would have liked, it was fairly clear about the data that would be sent.

> primary motives

Again, this is irrelevant.

> that is not what they are doing

That's (part of) what DeepMind does.

> trust google

You may trust Google, but trust isn't transitive. You don't get to make that decision for other people.

> more concerned

You don't know that; you should really stop assuming you know what other people's opinions are.

> about their medical data than they are their personal history

Both of those are important. Where people place their concern will depend on their personal situation, which might or might not include medical data.

> my medical data doesn't define me

You're projecting your own situation on others. For someone with HIV (which is one of that data points being sent to Google), that medical fact may play a very defining role in their life.

> if only such zeal existed for governments

It does. Why would you think that someone that is against Google gathering more data is fine with the government doing the same? Both of those groups need to be defended against.

> an orginisation who leaves sensitive literally just lying around

Pointing to problems at other orginizations doesn't excuse making new problems.

> anonymised sufficiently

Then prove it; the agreement only listed [name (including initials0, address, NHS number, photographs or videos] as "identifying" information and [full postcode, date of birth, telephone number, e-mail address] as identifying in combination. It requires that those be sent encrypted (why not hashed? this means they can be decrypted).

> deepmind has absolutely no interest

Who cares what their interest is today. They may not have the choice, there interests can change, etc. The argument that everyone should "trust us, we're not greedy and have only good motives" has always been stupid, and Google themselves have a poor history of respecting user data.


"DeepMind IS Google. Asserting that it isn't doesn't magically make them a separate legal entity. " Errr, they are a separate legal entity, so, i'm not sure what you are talking about.

https://beta.companieshouse.gov.uk/company/07386350


> DeepMind IS Google. Asserting that it isn't doesn't magically make them a separate legal entity.

in the context of this data, yes, there is significant distinction between the two.

> Arrogant insults are not a useful form of rhetoric.

sorry, i'm not trying to be arrogant.

> You may trust Google, but trust isn't transitive.

true.

> You're projecting your own situation on others. For someone with HIV (which is one of that data points being sent to Google), that medical fact may play a very defining role in their life.

i'd argue that letting yourself be defined by your HIV status is a poor metric to live ones life by, but you're right, that's not my call to make.

> It does. Why would you think that someone that is against Google gathering more data is fine with the government doing the same? Both of those groups need to be defended against.

playing the cynical card about governments vs corporations - a corp has no incentive in using my data in the same way a government has ala dragnet surveilence, to which there is a resounding response of apathy toward. of course the HN crowd care about these things, but we're a somewhat isolated bubble of people compared to the rest of society.. although that's another discussion.

> Then prove it;

of course, i cannot do that. and my feelings toward a group of people can't dictate how others should feel toward them - i know. but the team at deepmind are looking to do some amazing things. and this article from vice screams ignorance. maybe deepmind could be more transparent about their work. that's not my call though.

> Who cares what their interest is today.

it matters significantly - they can't keep the data after sept 2017, so if for some reason the direction of the group changes, there is only so much they can do with it. but it would set the field back decades if anyone at deepmind were to do anything so irresponsible as what you might be suggesting.


> a corp has no incentive in using my data in the same way a government has ala dragnet surveilence,

Did you miss the disclosures about that surveillance over the last few years, which indicate that governments are taking data from from those corporate databases? With programs like (but not limited to) Prism, giving data to a corporation may effectively be the same giving it to governments.

> anyone at deepmind were to do

You need to consider a threat model that includes more than that one group. What about everyone else at Google, or anybody with access to Google's internal networks, etc. Security is a process that has to be applied at all layers.


not entirely clear what you're cross about here, but perhaps the original story (which I reported) might be useful - https://www.newscientist.com/article/2086454-revealed-google...

happy to address any criticisms


Yours is a much better story. It'd be nice if mods could change the link from Vice to News Scientist.


Agree. Mods can this happen?


dang (https://news.ycombinator.com/user?id=dang) can make it happen, unsure how to signal him but perhaps he searches his own mentions


emailed him. don't care massively, but the Vice piece is just a shallow re-write and it doesn't even link to us ¯\_(ツ)_/¯



This is probably one of the reasons some may be opposed to nationalized healthcare... The government making a widespread decision to send your medical data somewhere without your permission.


Although that is a valid concern, we can have nationalized health-care without the government sending data elsewhere.

Those two aren't inclusive of each other.


True, but when the data is in private hands, the government sets regulations, like HIPAA here in the US, to regulate those companies' actions. Meanwhile, the government can and often does do what it wants with what it has.


> The government

In this case the government would be the Department of Health. They've had no involvement with this.

"The NHS" would be NHS England, and they didn't set this up, although they might sta.rt being involved to check the controls.

The local NHS, the Clinical Commissioning Group, didn't set this up.

So we're talking about one hospital trust


but we don't really have the consent-at-scale technology for anything else, do we? tbh it's just this specific NHS Trust that made the call in this instance.

Personally, I see potential in ResearchKit to solve the consent problem re medical data


ResearchKit may be part of the solution for some things, although the auth and data management will be barriers for commercial providers for the near future.

I have a small team that is working with leaders in the space[1] to help many of the major EMR vendors support an open-standards based approach to medical record sharing.

We'll be seeking public feedback when some of the preliminary work is ready, but are very excited to get input from the community as we make progress.

[1] https://dbmi.hms.harvard.edu/news/more-power-patients




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: