Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> a time-to-unlearn kind of an acceptable agreement

Why put the burden to end users? I think the technology should allow for unlearning and even "never learn about me in any future models and derivative models".



No technology can guarantee 100% unlearning, and the only 100% guarantee is when the data is deleted before the model is retrained. Legally, even 99.99% accuracy may not be acceptable, but, only 100%.


> the only 100% guarantee is when the data is deleted before the model is retrained

That’s not even a guarantee. A model can hallucinate information about anyone, and by sheer luck some of those hallucinations will be correct. And as a consequence of forging (see section 2.2.1) you’d never be able to prove whether the data was in the training set or not.


Or rather some legal fiction that you can pretend is 100%. You can never achieve real 100% in practice after all. Eg the random initialisation of weights might already encode all the 'bad' stuff you don't want. Extremely unlikely, but not strictly 0% unlikely.

The law cuts off at some point, and declares it 100%.


All this is technically correct, but it also means this technology is absolutely not ready to be used for anything remotely involving humans or end user data.


Why? We use random data in lots of applications, and there's always the theoretical probability that it could 'spell something naughty'.


It's about models' ability to unlearn information or to configure their training environment so that something is never learned in the first place... is not exactly the same as "oups, we logged your IP in a log by accident".

A company is liable even if they have accidentally retained / failed to delete personal information. That's why we have a lot of standards and compliance regulation to ensure a bare minimum of practices and checks are performed. There is also the cyber resilience act coming up.

If your tool is used by/for humans, you need beyond 100% certitude exactly what happens with their data and how it can be deleted and updated.


You can never even got to 100% certainty, yet alone 'beyond' that.

Google can't even get 100% certainty that they eg deleted a photo you uploaded. No AI involved. They can get an impressive number of 9s in their 99.9..%, but never 100%.

So this complaint when taken to the absolute like you want to take it, says nothing about Machine Learning at all. It's far too general.


The technology is on par with a Markov chain that's grown a little too much. It has no notion of "you", not in the conventional sense at least. Putting the infrastructure in place to allow people (and things) to be blacklisted from training is all you can really do, and even then it's a massive effort. The current models are not trained in such a way that you can do this without starting over from scratch.


That’s hardly accurate. Deep learning among other things is another type of lossy compression algorithm.

It doesn’t have a 1:1 mapping of each bit of information it’s been trained with, but you can very much extract a subset of that data. Which is why it’s easy to get DallE to recreate the Mona Lisa, variations on that image show up repeatedly in its training courpus.


Well then, maybe we shouldn't use the technology.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: