I find it deeply offensive that this work is presented under the auspices of sci...

summerlight · on Jan 24, 2024

> There is no way we can know if Google is lying because there's no way to check. It should be assumed that every example has been cherry-picked and post processed. It should be assumed that the data used to train the model (if one was trained at all) was illicitly acquired. We have to start from a mindset of extreme skepticism because Google now routinely makes claims that cannot be demonstrated.

This doesn't sound like a productive stance for science. You don't trust their result? It's fine to ignore all the claimed artifacts and you can just take the core idea. You don't have to assume anything malice to invalidate their so-called advertisement.

While this kind of stance might make you feeling a bit better, but will also make your claim political and slow you down if it happens to be true, given the history that many of Google's papers eventually have become a foundation of other useful technologies even though almost of all them didn't contain reproducible artifacts.

cwkoss · on Jan 25, 2024

What about this makes you refer to it as "science"?

summerlight · on Jan 25, 2024

I don't want to spend any time on the philosophical debate which is destined to be inconclusive. If you want to get some answer, make a more specific point.

gs17 · on Jan 25, 2024

>and you can just take the core idea

That's generally easier said than done. The dataset isn't available, and there's not really enough details in the paper to make it replicable even if you have the resources to do it.

summerlight · on Jan 25, 2024

Still, we have folks who have revolutionized the entire industry based on those Google papers usually without enough details. Being "hard" is usually not a good excuse.

Workaccount2 · on Jan 24, 2024

Just an FYI, it's not illegal to use data to train a model. It's illegal to have a model output that (identical) data for commercial gain.

This difference is purposely muddied, but important to understand.

leereeves · on Jan 24, 2024

> it's not illegal to use data to train a model

That's not at all settled law. AI companies are hoping to use the fair use exception to protect their businesses, but it looks like it will soon be clarified the other way.

Wired summed it up: "Congress Wants Tech Companies to Pay Up for AI Training Data"

https://www.wired.com/story/congress-senate-tech-companies-p...

And Ars wrote "Media orgs want AI firms to license content for training, and Congress is sympathetic."

https://arstechnica.com/information-technology/2024/01/at-se...

"[Senator] Hawley expressed concerns that if the tech companies' expansive interpretation of fair use prevails, it would be like "the mouse that ate the elephant"—an exception that would make copyright law toothless."

summerlight · on Jan 24, 2024

Currently, neither party has a strong legal ground yet and may require another landmark case to fully settle it down.

leereeves · on Jan 24, 2024

If Congress doesn't get there first.

summerlight · on Jan 24, 2024

Even if the Congress made a law, that can be effectively delayed by injunctions until the Supreme court made the ultimate decision. And I'm pretty sure big techs will challenge with an army of lawyers.

Workaccount2 · on Jan 24, 2024

Again, fair use concerns the production of copyrighted works, it has nothing to do with the training. If this was the case, every person who could draw a batman symbol from memory would be in violation of copyright.

"Using copyrighted works for monetary gain" refers to using art itself as the product. Knowing what Apple's logo is and making a logo in that style is not a violation of copyright. However using Apple's logo (or something strikingly close) is a violation.

The reason this is muddied is because legally artists don't really have a leg to stand on for "my art cannot be trained on by a computer" whereas they do have strong legal precedent (and actual laws) for "my art cannot be reproduced by a computer".

leereeves · on Jan 24, 2024

> fair use concerns the production of copyrighted works, it has nothing to do with the training

Training is the "production" of a derivative work (a model) based on the training data.

AI companies claim that this is covered by fair use, but this is simply a claim that has not yet been tested in court.

And even if courts rule in favor of the AI companies, it sounds likely (based on what I've read) that Congress will soon rewrite the law to support the artists' position.

hmcq6 · on Jan 24, 2024

It definitely depends on where you get that data from.

You don't have the right to make a copy of an e-book and keep that file on your server/computer for the purposes of training AI. Copying that file onto your computer is in many cases already an act of copyright infringement.

GaggiX · on Jan 24, 2024

>When the performance of Gemini in bard is compared to GPT-4 for example, it falls far short.

How did people get access to Gemini Ultra? Or are you talking about Gemini Pro, the one that compares to GPT-3.5?

whamlastxmas · on Jan 24, 2024

This video is almost certainly done mostly for Google investors: look, we aren't dying, search isn't dying! dancing bears!

That said, if this tech is as advertised, extremely impressive to me

bugglebeetle · on Jan 24, 2024

> There is no way we can know if Google is lying because there's no way to check it.

We can gather that they are likely to be lying or cherry-picking examples to make themselves look better, since they were already caught faking an AI demo. In the world of actual research, if you got caught doing this, all your subsequent and prior work would be under severe scrutiny.