Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find it deeply offensive that this work is presented under the auspices of scientific research.

The only way to describe this is bragging, advertising, or marketing. There are no reproducible processes described. While the diagram of their architecture may inspire others it does not allow for the most crucial aspect of the scientific endeavor, falsification.

There is no way we can know if Google is lying because there's no way to check. It should be assumed that every example has been cherry-picked and post processed. It should be assumed that the data used to train the model (if one was trained at all) was illicitly acquired. We have to start from a mindset of extreme skepticism because Google now routinely makes claims that cannot be demonstrated. When the performance of Gemini in bard is compared to GPT-4 for example, it falls far short. When they release a video claiming to be an interaction with a model it turns out it wasn't anything of the kind.

Ideally no organization would operate like this but Google has become a particularly egregious repeat offender.



> There is no way we can know if Google is lying because there's no way to check. It should be assumed that every example has been cherry-picked and post processed. It should be assumed that the data used to train the model (if one was trained at all) was illicitly acquired. We have to start from a mindset of extreme skepticism because Google now routinely makes claims that cannot be demonstrated.

This doesn't sound like a productive stance for science. You don't trust their result? It's fine to ignore all the claimed artifacts and you can just take the core idea. You don't have to assume anything malice to invalidate their so-called advertisement.

While this kind of stance might make you feeling a bit better, but will also make your claim political and slow you down if it happens to be true, given the history that many of Google's papers eventually have become a foundation of other useful technologies even though almost of all them didn't contain reproducible artifacts.


What about this makes you refer to it as "science"?


I don't want to spend any time on the philosophical debate which is destined to be inconclusive. If you want to get some answer, make a more specific point.


>and you can just take the core idea

That's generally easier said than done. The dataset isn't available, and there's not really enough details in the paper to make it replicable even if you have the resources to do it.


Still, we have folks who have revolutionized the entire industry based on those Google papers usually without enough details. Being "hard" is usually not a good excuse.


Just an FYI, it's not illegal to use data to train a model. It's illegal to have a model output that (identical) data for commercial gain.

This difference is purposely muddied, but important to understand.


> it's not illegal to use data to train a model

That's not at all settled law. AI companies are hoping to use the fair use exception to protect their businesses, but it looks like it will soon be clarified the other way.

Wired summed it up: "Congress Wants Tech Companies to Pay Up for AI Training Data"

https://www.wired.com/story/congress-senate-tech-companies-p...

And Ars wrote "Media orgs want AI firms to license content for training, and Congress is sympathetic."

https://arstechnica.com/information-technology/2024/01/at-se...

"[Senator] Hawley expressed concerns that if the tech companies' expansive interpretation of fair use prevails, it would be like "the mouse that ate the elephant"—an exception that would make copyright law toothless."


Currently, neither party has a strong legal ground yet and may require another landmark case to fully settle it down.


If Congress doesn't get there first.


Even if the Congress made a law, that can be effectively delayed by injunctions until the Supreme court made the ultimate decision. And I'm pretty sure big techs will challenge with an army of lawyers.


Again, fair use concerns the production of copyrighted works, it has nothing to do with the training. If this was the case, every person who could draw a batman symbol from memory would be in violation of copyright.

"Using copyrighted works for monetary gain" refers to using art itself as the product. Knowing what Apple's logo is and making a logo in that style is not a violation of copyright. However using Apple's logo (or something strikingly close) is a violation.

The reason this is muddied is because legally artists don't really have a leg to stand on for "my art cannot be trained on by a computer" whereas they do have strong legal precedent (and actual laws) for "my art cannot be reproduced by a computer".


> fair use concerns the production of copyrighted works, it has nothing to do with the training

Training is the "production" of a derivative work (a model) based on the training data.

AI companies claim that this is covered by fair use, but this is simply a claim that has not yet been tested in court.

And even if courts rule in favor of the AI companies, it sounds likely (based on what I've read) that Congress will soon rewrite the law to support the artists' position.


It definitely depends on where you get that data from.

You don't have the right to make a copy of an e-book and keep that file on your server/computer for the purposes of training AI. Copying that file onto your computer is in many cases already an act of copyright infringement.


>When the performance of Gemini in bard is compared to GPT-4 for example, it falls far short.

How did people get access to Gemini Ultra? Or are you talking about Gemini Pro, the one that compares to GPT-3.5?


This video is almost certainly done mostly for Google investors: look, we aren't dying, search isn't dying! dancing bears!

That said, if this tech is as advertised, extremely impressive to me


> There is no way we can know if Google is lying because there's no way to check it.

We can gather that they are likely to be lying or cherry-picking examples to make themselves look better, since they were already caught faking an AI demo. In the world of actual research, if you got caught doing this, all your subsequent and prior work would be under severe scrutiny.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: