Hacker News new | past | comments | ask | show | jobs | submit login

Does GDPR consider data in training sets and trained deep learning models as your data? It's kind of a small snapshot of your expected responses to some stimulus right, it's arguably more your data than anything...



If it's personally identifiable, yes. You also need opt-in (not opt-out or buried deep in a TOS) permission to use personal data in that way before feeding it to your learning model (since that use-case is basically never the primary purpose that the data was given for).

If you use any sort of automated system to make decisions about a EU customer that impacts their life in a significant way (like whether to ban them or not) you will also need to have some sort of appeals system where they can appeal to have the decision looked at by a human and potentially have it reversed.


>You also need opt-in (not opt-out or buried deep in a TOS) permission to use personal data in that way before feeding it to your learning model (since that use-case is basically never the primary purpose that the data was given for).

Huh, now _that's_ interesting. Do you have a source for that? I know some guys at work that'll be upset if I can prove that to them, given that their pet project is a MI personalisation system making heavy use of just watching everything everyone does in an identifiable manner.

(I'll be honest, part of the draw is being able to say 'I told you so'~)


A general point of the GDPR is that when you collect data, consent is given for a business purpose. The user has the ability to opt-in to different business use-cases if they so choose. Data collected cannot be used for a business case that was not consented to by the user.

This area is one that gets more legal-y than other parts of the GDPR, because in some cases you can use data without consent if it's legitimately required to provide the service the user asked for, and as far as I can tell there's not a lot of guidance on what counts as being a different business use. But yeah, personalization is usually not a strictly necessary feature of most platforms, so you're gonna need the user to opt-in to using their data that way.

This guidance is kinda spread out over the GPDR, but one area of relevance:

https://gdpr-info.eu/art-13-gdpr/

Pay attention specifically to (3), but also (1)(c) and (2). Part (3) quoted below:

  Where the controller intends to further process the
  personal data for a purpose other than that for which 
  the personal data were collected, the controller shall 
  provide the data subject prior to that further 
  processing with information on that other purpose and 
  with any relevant further information as referred to in 
  paragraph 2.


I think these are the relevant parts:

> When assessing whether consent is freely given, utmost account shall be taken of whether, inter alia, the performance of a contract, including the provision of a service, is conditional on consent to the processing of personal data that is not necessary for the performance of that contract.

From https://gdpr-info.eu/art-7-gdpr/ paragraph 4

And the definition of consent is here:

> ‘consent’ of the data subject means any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her;

From https://gdpr-info.eu/art-4-gdpr/ paragraph 11

---

'specific' and 'unambiguous' in combination seem to disallow the "bury it in the TOS" cop-out.

'informed' and 'specific' in combination seem to disallow the opt-out cop-out (since a opt-out permission is never specific, and basically never informed).

Article 7 paragraph 4 (the first quote) seems to disallow the usage of data unless it is necessary for the service.

Of course this is still all pretty untested in the courts, and IANAL but to me it seems pretty clear. If your primary service is not building a machine learning model based on your own users data you will need to get your users to opt-in for that specific use-case.


The data subject also has the right to know what the basis of the automated decision was, too.


Relevant GDPR text available at: https://gdpr-info.eu/recitals/no-162/

In short, aggregated data or statistical summaries is not constrained in the same way. I think you still need consent into to perform the aggregation/summarization, and said processing needs to ensure "statistical confidentiality," but such results are not PI.

(IANAL, and I'm still trying to understand this myself.)


So basically, the training set is under the GDPR if it includes PI, but the resulting model is not (unless you can extract PI from it), and you need user permission to use PI for training in most cases, right?

(Also IANAL, and also trying to understand)


A lot of this type of thing isn't clear yet and will be worked out when GDPR is enforced.

At my previous employer, we took a pretty comprehensive view and tried to play it safe, so at the very least any non-anonymous data in training sets would qualify. That does, however, already beg the question of why on Earth you'd need to train a model with non-anonymized data in the first place!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: