Hacker News new | past | comments | ask | show | jobs | submit login

AI doesn't "learn". It's statistical inference if anything.

If I took two copy-righted pictures and layered them on top of each other at 50% opacity. Would that be OK or copy right infringement?

AI models just use more weights/biases and more images (or any input).




And what is LEARNING in your opinion?


Cambridge dictionary has it as: "knowledge or a piece of information obtained by study or experience".

If I scanned a thousand polaroid pictures, and took their average RGB values and created a LUT that I could apply to any photograph to make it look "polaroidy" - would that be learned? Or the application of a statistical inference model? This alone is probably far enough abstracted to never be an ethical or legal issue. However, if I had a model that was only "trained" on Stephen King books, and used it to write a novel, would that be OK? Or do you think it would be in the realm of copyright infringement?

By your definition anything a computer does means it has learned it. If I copy and paste a picture, has the computer "learned" it while it reads out the data byte-by-byte? That sure sounds like it is "studying" the picture.

"AI" and "ML" are just statistics powered by computers that can do billions of calculations per second. It is not special, it is not "learning". To portray some value to it as something else is disingenuous at best, and fraud at worst.


Your polaroid example would require someone to write code that does that one specific thing. You could also argue that this would violate copyright if it was trained on some photographer's specific unique style, made as an app and marketed as being able to mimic the photographer's style. But in your example you have 1000 random polaroid images of unknown origin, so somehow it becomes abstract enough that it doesn't become an issue.

In your stephen king example I would say it's still learned, because the "code" is a general language model that can learn anything. It's just you decided to only train it on stephen king novels. If you have an image model that trained 100% on public domain images and finetune it to replicate a specific artist's style I would personally think the finetuned model and its creator is maybe violating copyright.

But when it comes to learning I would say when you write a program whose purpose is to learn the next word or pixel, but it's up to the computer to figure out how to do that, the computer is learning when you feed it input data. It's the program's job to figure out the best way to predict, not the programmer. (it's not that black and white given that the programmer will also sometimes guide the program, but you get the idea)

When you write a program that does one or several things, it's not learning.

I think it's something to do with the difference between emergent behavior from simple rules and intentional behavior from complex rules.


I think you're using fancy language like "general language model" to obscure the facts.

If I created a program to read words from the input and assign weights based on previous words, I could feed in any data. Just like the polaroid example. (I suggested that the polaroid example was abstract enough not to be an ethical/legal problem because I believe it is mostly transformative, unless the colours themselves were copyrighted or a distinct enough work in themselves.)

Now If I only feed in Stephen King books and let it run, suddenly it outputs phrases, wording, place names, character names, adjectives all from Stephen King's repertoire. Is this a 'general language model'? Should this by copyright exempt? I don't think this is transformative enough at all. I've just mangled copyrighted works together, probably not enough to stand-up against a copyright claim.

I think people use AI and ML as buzzwords to try and obfuscate what's actually happening. If we were talking about AI and ML that doesn't need training on any licensed or copyrighted work (including 'public domain') then we can have a different conversation, but at the moment it's obscured copyright theft.


I can agree it's obscure in the sense that we shrug when asked about how it works. If you specifically train a model to mimic a specific style I can get behind it leaning more towards theft, or at least being immoral regardless of laws.

If you train a model to replicate 10000 specific artists, I could also get behind it being more like theft.

But if the intention was to train with random data (and some of it could be copyrighted) just like your polaroid example to generate anything you want, I'm not so sure anymore.

I feel the intent is the most important part here. But then again I don't know the intent behind these companies, and I guess you don't either. Maybe no single person working in these companies know the intent either.

It also gets murky when you have prompts that can refer to specific artists and when people who use the models explicitly try to copy an artists style. In the case of stable diffusion, if the CEO's to be believed the clip model had learned to associate images of greg ruktowski and other artists to images that were not theirs but in a similar style[0]

Even murkier is when you have a base model trained on public data, but people finetune at home to replicate some specific artist's style.

[0] https://twitter.com/EMostaque/status/1571634871084236801


> If I scanned a thousand polaroid pictures, and took their average RGB values and created a LUT that I could apply to any photograph to make it look "polaroidy" - would that be learned?

You wouldn't. LUT would.


It's data. No one owns data.


Can I have you’re credit card number, expiry and verification number please? Also your DNA ?

Since it’s data that should be cool right ?


Equating human cognition with machine algorithms is the root of the issue, and a significant part of its "legitimacy" comes from the need for "AI" companies to push their products as effective, and there's no better marketing than to equate humans to machines. Not even novel.


It requires abstraction. Something that LLMs are not capable of, beyond trivial amounts.


TRAINING your 3rd eye/branch predictor

if(nonfree_software){

// unhappy path

}


You can make out the two original copyrighted pictures in that case, and all you did was using 50% opacity which might not be very transformative, so probably?

In my mind (and I suspect others too) in machine learning context, statistical inference and learning became synonymous with all the recent development.

The way I see it, there's now a discussion around copyright because people have different fundemental views on what learning is and what it means to be a human that don't really surface.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: