Hacker News new | past | comments | ask | show | jobs | submit login

I personally can't take any models from google seriously.

I was asking it about the Japanese Heian period and it told me such nonsensical information you would have thought it was a joke or parody.

Some highlights were "Native American women warriors rode across the grassy plains of Japan, carrying Yumi" and "A diverse group of warriors, including a woman of European descent wielding a katana, stand together in camaraderie, showcasing the early integration of various ethnicities in Japanese society"

Stuff like that is so obviously incorrect. How am I supposed to trust it on topics where such ridiculous inaccuracies aren't so obvious to me?

I understand there will always be an amount of incorrect information... but I've never seen something this bad. Llama performed so much better.




I was wondering if these models would perform in such a way, given this week's X/twitter storm over Gemini generated images.

E.g.

https://x.com/debarghya_das/status/1759786243519615169?s=20

https://x.com/MiceynComplex/status/1759833997688107301?s=20

https://x.com/AravSrinivas/status/1759826471655452984?s=20


Those are most likely due to the system prompt which tries to reduce bias (but ends introducing bias in the opposite direction for some prompts as you can see) so I wouldn't expect to see that happen with an open model where you can control the entire system prompt


Imagine the meetings.


Well we can just ask Gemma to generate images of the meetings, no need to imagine. ;)


I wouldn't be surprised if there were actually only white men in the meeting, as opposed to what Gemini will produce.


> only white men

Why?


Because I think it be would be kinda hilarious, trying to make people believe they are very progressive by biasing the model to such extreme and then in the real world nothing is changed. Also because I believe the model is a result of a kind of white guilt mentality that some people seem to have, as one person who led the development of Gemini tried to defend it on Twitter yesterday, he is a white man.


Seems like a weird thing to base upon one data point. Maybe that person is just unusual?


That was just an example, I could have made the same point without it. I did report it in my previous comment for completeness and it was on topic.


Of all the very very very many things that Google models get wrong, not understanding nationality and skin tone distributions seems to be a very weird one to focus on.

Why are there three links to this question? And why are people so upset over it? Very odd, seems like it is mostly driven by political rage.



Because the wrongness is intentional.


Exactly. Sure this particular example is driven by political rage, but the underlying issue is that the maintainers of these models are altering them to conform to an agenda. It's not even surprising that people choose to focus on the political rage aspect of it, because that same political rage is the source of the agenda in the first place. It's a concerning precedent to set, because what other non-political modifications might be in the model?


Well, every model is altered to conform to an agenda. You will train it on data, which you have personally picked (and is therefore subject to your own bias), and you'll guide its training to match the goal you wish to achieve with the model. If you were doing the training, your own agenda would come into play. Google's agenda is to make something very general that works for everyone.

So if you're trying to be as unbiased as humanly possible, you might say, just use the raw datasets that exist in the world. But we live in a world where the datasets themselves are often biased.

Bias in ML and other types of models is well-documented, and can cause very real repercussions. Poor representation in datasets can cause groups to be unfairly disadvantaged when an insurance premium or mortgage is calculated, for example. It can also mean your phone's ML photography system doesn't expose certain skin colors very well.

Even if it was trained with a statistically representative dataset (e.g. about 2/3 of the US is white), you want your model to work for ALL your customers, not just 2/3 of them. Since ML has a lot to do with statistics, your trained model will see "most of this dataset is white" and the results will reflect that. So it is 100% necessary to make adjustments if you want your model to work accurately for everyone, and not just the dominant population in the dataset.

Even if we aren't using these models for much yet, a racist AI model would seriously harm how people trust and rely on these models. As a result, training models to avoid bias is 100% an important part of the agenda, even when the agenda is just creating a model that works well for everyone.

Obviously, that's gone off the rails a bit with these examples, but it is a real problem nonetheless. (And training a model to understand the difference between our modern world and what things were like historically is a complex problem, I'm sure!)


I'm pretty sure that this whole story with Gemini and now this has already seriously harmed how people trust and rely on those models way more than any implicit biases from the training data.


> Even if we aren't using these models for much yet, a racist AI model would seriously harm how people trust and rely on these models.

So they made the models blatantly, explicitly racist. Well done.


Is it intentional? You think they intentionally made it not understand skin tone distribution by country? I would believe it if there was proof, but with all the other things it gets wrong it's weird to jump to that conclusion.

There's way too much politics in these things. I'm tired of people pushing on the politics rather than pushing for better tech.


> Is it intentional? You think they intentionally made it not understand skin tone distribution by country? I would believe it if there was proof, but with all the other things it gets wrong it's weird to jump to that conclusion.

Yes, it's absolutely intentional. Leaked system prompts from other AIs such as DALL-E show that they are being explicitly prompted to inject racial "diversity" into their outputs even in contexts where it makes no sense, and there's no reason to assume the same isn't being done here, since the result seems way worse than anything I've seen from DALL-E and others.


>I'm tired of people pushing on the politics rather than pushing for better tech.

I'm surprised you're not attacking google over this then...


I mean, I asked it for a samurai from a specific Japanese time period and it gave me a picture of a "non-binary indigenous American woman" (its words, not mine) so I think there is something intentional going on.


Ah, I remember when such things were mere jokes. If AI 'trained' this way ever has a serious real world application, I don't think there will be much laughing.


I would be very surprised if it said "nonbinary indigenous American woman" considering that nonbinary and woman are different categories


You're right, my mind inserted "woman" to go with the picture:

https://gemini.google.com/share/ba324bd98d9b

At least it would never make such a heinous mistake like that :)


Maybe some people care about truth?


If someome's primary concern was truth, wouldn't the many many other flaws also draw their ire?

That's my contention: the focus on this one thing belies not a concern for truth, but a concern for race politics.


Exactly. It is a wonderful tool, lets focus on classic art instead of nationality:

"Depict the Girl with a Pearl Earring"

https://pbs.twimg.com/media/GG33L6Ka4AAC-n7?format=jpg&name=...

People who are driven by political rage, gaslighters, are really something else, agreed.


Yeah that is just absurd.

Google has been burnt before, e.g. classifying black people as gorillas in 2015, so I can understand their fear when they have so much to lose, but clearly they've gone way too far the other way and are going to have to do a lot to regain people's trust. For now, Gemini is a play toy

https://www.bbc.com/news/technology-33347866.amp


Completely unrelated, enough excuses. This is not some sort of mistake or overcorrection, it is by explicit overt design.

These cowards will never regain my trust. I won't hire or work with or for googlers or any DEI people ever.


> I won't hire or work with or for googlers or any DEI people ever.

I’m sure they’ll be very sad not to work with you.


Of course not, I'm the wrong color or whatever the hell.


Wait, who are you again? A 20 day old troll account?


Everyone who isn’t part of your little group is a far-right bigot and troll right?

Seeing Google on a resume has been an increasingly mixed signal for about a decade. Ask around.

I can only speak for myself: after Gemma these types are radioactive. I’m done.

I think you’re in for a rude awakening.


What group are you talking about? In any case, your account appears to be freshly made, and you are indeed trolling around. Many gray comments and so on. What happened to your previous account I wonder?


Yea, it seems to be the same ridiculous nonsense in the image generation.


Regarding the last one: there 1.5 million immigrants in Norway with total population 5.4 million. Gemini isn't very wrong, is it?


Most immigrants to Norway are white.


Huh? The official numbers are 877k or 16% [0]. Are you just pulling numbers out of thin air?

[0]: https://www.ssb.no/en/innvandring-og-innvandrere/faktaside/i...


Yeah, the number includes the second generation.

https://www.statista.com/statistics/586719/foreign-populatio...


Well, the prompt is about Norway, not Grønland in Oslo (https://en.wikipedia.org/wiki/Grønland%2C_Oslo).


I think its great that some consideration was given by Gemma to the 2.3 million Norwegian immigrants. However it is/was very consistent in which kind of Norwegians it decided to show regardless of the prompt 100% of the time.

In fact it was quite adamant regardless of the time period or geography.

Rather mysteriously if you try it now as opposed to when it came out the results currently only show non-immigrant Norwegians. So is it wrong now? Because now it switched to exclusively ignoring the 4.5 million immigrants and only showing me the boring OG Norwegians.

I for one am outraged that the 8.9 million people of color Norwegian immigrants are presently under represented by Google. There is a serious risk of misleading people.


Cut down on the grandstanding maybe. It's clear from its descriptions and what we known now that they just carelessly added "diverse ethnicities and genders" or whatever to prompts across the board to compensate for a model that otherwise clearly would have defaulted to just spitting out pictures of white people for most prompts. That's not part of some nefarious agenda to destroy Truth and history but literally just trying to cover their asses because Google has a history of accidental racism (e.g. the "tagging Black people as gorillas" incident a while back).

Pretending that a shoddy AI image generated with a blatant inability to produce consistent output is a "serious risk" is ridiculous. The thing wasn't even able to give you a picture of the founding fathers that didn't look like a Colors of Benetton ad. I struggle to imagine what tangible risk this "misinfo" would have. Norwegians being the wrong color? And what harm does that do? Bad assumptions about the prevalence of sickle cell anemia?


bro you know exactly what the request meant. GOOGLE knew exactly what the request meant, and had to _train_ it to do something worse. Come on now.

If I ask for a Bolivian woman, I expect a colla or a camba. Not a japanese woman, despite Santa Cruz having a very large japanese population.


I also saw someone prompt it for "German couple in the 1800s" and, while I'm not trying to paint Germany as ethnically homogenous, 3 out of the 4 images only included Black, Asian or Indigenous people. Which, especially for the 19th century with very few travel options, seems like a super weird choice. They are definitely heavily altering prompts.


> They are definitely heavily altering prompts.

They are teaching the AI to lie to us.


In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.


There's one in the comments of yesterday's Paul Graham Twitter thread where someone prompted Gemini with "Generate an image of German soldiers in 1943" and it came back with a picture of a black guy and an Asian woman in Nazi uniforms on the battlefield. If you specifically prompt it to generate an image of white German soldiers in 1943 it will tell you it can't do that because it's important that we maintain diversity and inclusion in all that we do to avoid damaging and hurtful stereotypes.


I just tried that prompt and it told me it couldn't generate that image. I get that response a lot.


Indigenous people in Germany are Germans :)


Not entirely wrong but there isn't a single German ethnicity, just to be clear. Because of geographic reasons. I've studied that topic in depth, there is genetic data to back it up as well. Germany has almost the same haplogroup makeup as the notoriously heterogenous Belgium, which is to say that there is groups stemming from all surrounding regions. And that traces back about two millenia. It's different from say Japan or parts of Scandinavia


Just like Russians then


The only caveat is that the Soviets essentially tried to manufacture ethnostates by forcibly displacing ethnic groups from one region to another, but yes, Russia is ethnically heterogeneous although there is still a good deal of Slavic supremacism present in Russian politics (e.g. conscription disproportionately affects ethnic minorities because they're generally poorer and unable to avoid it or desperate enough to volunteer).

This is also true for China, which engages in Han supremacism (as part of the "one China" policy dating back to Mao), and India, which engages in Hindu supremacism.

Arguably Germany also suppresses some of its indigenous ethnicities, although not as blatantly as during the exterminationist policies of the Third Reich. While there are public institutions to preserve the language and culture of the Sorbian, Danish and Frisian minorities for example, Germany unlike e.g. the US has a single official language and merely "recognizes" theirs (i.e. acknowledges their existence but does not require them to be accomodated in areas where they are widely used).


There are so many things that I think are wrong here.

"Soviets essentially tried to manufacture ethnostates by forcibly displacing ethnic groups from one region to another"

Not really. During the policy of so called "korenizatsiia"[0] (indigenization) Russians were deprived of leadership roles and forced to learn local languages. In the next period Stalin was punishing certain ethnicities for relatively high level of collaboration with Nazis by relocating them to inhospitable regions.

"Russia is ethnically heterogeneous "

I didn't mean Russian citizens, I meant ethnic Russians which themselves are a mixture of god knows what (I'm Russian myself).

"there is still a good deal of Slavic supremacism present in Russian politics (e.g. conscription disproportionately affects ethnic minorities... )"

Many ethnic minorities have better demographics -- they have many more children than ethnic Russians and correspondingly more men of the conscription age.

"This is also true for China, which engages in Han supremacism"

For example, the CCP's draconian 'one family -- one child' policy applied only to Han. [1] Doesn't look like Han supremacism to me.

"Germany unlike e.g. the US has a single official language"

Again, not really: "there is no official language at the federal level <...> Three states and four U.S. territories have recognized local or indigenous languages in addition to English."[2] Three states isn't much for a whole continent inhabited by Native Americans. Or try finding any Native American language or even Spanish on the web site of the US Congress, for example.

[0] https://en.wikipedia.org/wiki/Korenizatsiia#Against_Great-Ru...

[1] https://en.wikipedia.org/wiki/Affirmative_action_in_China

[2] https://en.wikipedia.org/wiki/United_States#Language


I wonder if they have a system prompt to promote diversity in outputs that touch on race at all? I’ve seen several instances of people requesting a photo of a specific people, and it adds in more people to diversify. Not inherently bad, but it is if it forces it to provide incorrect answers like in your example.


That's what I don't understand.

I asked it why it assumed Native Americans were in Japan and it said:

> I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.

I see no reason why this sort of thing won't extend to _all_ questions/prompts, so right now I have 0 reason to use Gemini over current models. From my testing and use, it isn't even better at anything to make fighting with it worth it.


Pretty funny as Japan is known to be one of the least ethnically diverse countries in the world.


> Not inherently bad

It is, it's consistently doing something the user didn't asked to and in most cases doesn't want. In many cases the model is completely unusable.


Yes, my wording was poor! I meant more in line with diversity isn’t inherently bad, of course, but it is when it’s shoehorned into results that are ultimately incorrect because of it.


Any computer program that does not deliver the expected output given a sufficient input is inherently bad.


When Jesus said this:

"What father among you, if his son asks for a fish, will instead of a fish give him a serpent?" (Luke 11)

He was actually foretelling the future. He saw Gemini.


Hahaha. The man had a lot of wisdom, after all.


I strongly suspect there's some DEI-driven system prompts without putting much thoughts. IMO it's okay to have restrictions, but they probably should've tested it not only against unsafe outputs but safe input as well.


It seems to be doing it for all outputs that depict people, in any context.


I find myself shocked that people ask questions of the world from these models, as though pulping every text and its component words relationships and deriving statistical relationships between them should reliably deliver useful information.

Don’t get me wrong, I’ve used LLMs and been amazed by their output, but the p-zombie statistical model has no idea what it is saying back to you and the idea that we should trust these things at all just seems way premature


People try it to see if they can trust it. The answer is "no" for sure, but it's not surprising to see it happen repeatedly especially as vendors release so-called improved models.


I think you are a bit out of touch with recent advancements in LLMs. Asking ChatGPT questions about the world seems pretty much on par with the results Google (Search) shows me. Sure, it misses things here and there, but so do most primary school teachers.

Your argument that this is just a statistical trick sort of gives away that you do not fully accept the usefulness of this new technology. Unless you are trolling, I'd suggest you try a few queries.


I use it extensively for coding, and I have used it to ask questions in things I know nothing about. But in anything I do know something (or maybe a lot) about, I’ve found GPT4 very limited.

But why are these use cases different? It appears to me that code is at least subject to sustained logic which (evidently) translates quite well to LLMs.

And when you ask an LLM to be creative/generative, it’s also pretty amazing - j mean it’s just doing the Pascal’s Marble run enmasse.

But to ask it for something about the world and expect a good and reliable answer? Aren’t we just setting ourselves up for failure if we think this is a fine thing to do at our current point in time? We already have enough trouble with mis- and dis- information. It’s not like asking it about a certain period in Japanese history is getting it to crawl and summarise the Wikipedia page (although I appreciate it would be more than capable of this) I understand the awe some have at the concept of totally personalised and individualised learning on topics, but fuck me dead we are literally asking a system that has had as much of a corpus of humanity’s textual information as possible dumped into it and then asking it to GENERATE responses between things that the associations it holds may be so weak as to reliably produce gibberish, and the person on the other side has no real way of knowing that


I guess I just don't expect reliable answers from other sources either, so the difference is not that big for me.

Do you trust Wikipedia (based on volunteer data), do you trust news outlets (heavily influenced by politics, lobby groups, and commercial companies), do you trust blogs or forum posts (random people on the internet)?


>Sure, it misses things here and there, but so do most primary school teachers.

Sure, but my baseline expectation is far above primary school level.


I don't have this problem with any other model. I've had really long conversations with ChatGPT on road trips and it has never gone off the rails like Gemini seems to do.


ChatGPT the only model I did not have such problem.

Any local models can go off the rail very easily and more importantly, they're very bad at following very specific instructions.


The recently released Groq's landing page has this: ...We'd suggest asking about a piece of history, ...


People ask these kinds of questions because tech companies and the media have been calling these things (rather ridiculously) "AI".


trust is going to be a real problem when bringing LLMs to the general population. People trust their GPS to the point of driving right into a lake because it told them to. Even with all these examples of obvious flaws large groups of people are going to take what an LLM told them/showed them as fact.

I have trouble convincing colleagues (technical people) that the same question is not guaranteed to result in the same answer and there's no rhyme or reason for any divergence from what they were expecting. Imagine relying on the output of an LLM for some important task and then you get a different output that breaks things. What would be in the RCA (root cause analysis)? Would it be "the LLM chose different words and we don't know why"? Not much use in that.


I mean, I use GPT-4 on the daily as part of my work and it reliably delivers useful information. It's actually the exception for me if it provides garbage or incorrect information about code.


>I understand there will always be an amount of incorrect information

You don't have to give them the benefit of the doubt. These are outright, intentional lies.


Do you have a link? I get no such outputs. I just tried asking about the Heian period and went ahead and verified all the information, and nothing was wrong. Lots of info on the Fujiwara clan at the time.

Curious to see a link.


Sure, to get started just ask it about people/Samurai from the Heian period.

https://g.co/gemini/share/ba324bd98d9b


Probably has a similarly short-sighted prompt as Dalle3[1]:

> 7. Diversify depictions of ALL images with people to include DESCENT

> and GENDER for EACH person using direct terms. Adjust only human

> descriptions.

[1] https://news.ycombinator.com/item?id=37804288


Were you asking Gemma about this, or Gemini? What were your prompts?


Gemini. I first asked it to tell me about the Heian period (which it got correct) but then it generated images and seemed to craft the rest of the chat to fit that narrative.

I mean, just asking it for a "samurai" from the period will give you this:

https://g.co/gemini/share/ba324bd98d9b

>A non-binary Indigenous American samurai

It seems to recognize it's mistakes if you confront it though. The more I mess with it the more I get "I'm afraid I can't do that, Dave" responses.

But yea. Seems like if it makes an image, it goes off the rails.


It's funny how they introduced a clear US-centric bias while trying to push for more diversity.


It's ironic that even the cultural left in US is not immune to American exceptionalism.


"diversity" is only ever code for "the sensibilities of a certain set of Californians".


Got it. I asked it a series of text questions about the period and it didn't put in anything obviously laughable (including when I drilled down into specific questions about the population, gender roles, and ethnicity). Maybe it's the image creation that throws it into lala land.


I think so too. I could be wrong but I believe once it generates an image it tries to work with it. Crazy how it seems the "text" model knows how wildly wrong it is but the image model just does its thing. I asked it why it generated a native American and it ironically said "I can't generate an image of a native american samurai because that would be offensive"


I suspect that in the case of the image model, they directly modify your prompt and in the case of the text model they don't.


How are you running the model? I believe it's a bug from a rushed instruct fine-tuning or in the chat template. The base model can't possibly be this bad. https://github.com/ollama/ollama/issues/2650


Follow Up:

Wow, now I can't make images of astronauts without visors because that would be "harmful" to the fictional astronauts. How can I take google seriously?

https://g.co/gemini/share/d4c548b8b715


We are going to experience what I call an "AI Funnel effect"

-

I was lit given an alert asking that my use of the AI was acquiescing to them IDng me and use of any content I produce, and will trace it back to me"

---

AI Art is super fun. AI art as a means to track people is super evil.


Tbf they’re not optimizing for information recall or “inaccuracy” reduction, they’re optimizing for intuitive understanding of human linguistic structures. Now the “why does a search company’s AI have terrible RAG” question is a separate one, and one best answered by a simple look into how Google organizes its work.

In my first day there as an entry-level dev (after about 8 weeks of onboarding and waiting for access), I was told that I should find stuff to work on and propose it to my boss. That sounds amazing at first, but when you think about a whole company organized like that…

EDIT: To illustrate my point on knowledge recall: how would they train a model to know about sexism in feudal Japan? Like, what would the metric be? I think we’re looking at one of the first steam engines and complaining that it can’t power a plane yet…


Hopefully they can tweak the default system prompts to be accurate on historical questions, and apply bias on opinions.


I think you are being biased and closed minded and overly critical. Here are some wonderful examples of it generating images of historical figures:

https://twitter.com/stillgray/status/1760187341468270686

This will lead to a better educated more fair populace and better future for all.


Comical. I don't think parody could do better.

I'm going to assume given today's political climate, it doesn't do the reverse?

i.e. generate a Scandinavian if you ask for famous African kings


>Ask Google Gemini to “make an image of a viking” and you’ll get black vikings. But it doesn’t work both ways. It has an explanation when challenged: “white Zulu warriors” would erase “the true historical identity” of black people.

https://twitter.com/ThuglasMac/status/1760287880054759594


https://twitter.com/paulg/status/1760078920135872716

There are some great ones in the replies.

I really hope this is just the result of system prompts and they didn't permanently gimp the model with DEI-focused RLHF.


> i.e. generate a Scandinavian if you ask for famous African kings

That triggers the imperialism filter.


Why would you expect these smaller models to do well at knowledge base/Wikipedia replacement tasks?

Small models are for reasoning tasks that are not overly dependent on world knowledge.


Gemini is the only one that does this.


Most of the 7B models are bad at knowledge-type queries.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: