Since the first (good) image generation models became available, I've been tryin...

Scene_Cast2 · 2025-11-14T19:14:32 1763147672

I've noticed that image models are particularly bad at modifying popular concepts in novel ways (way worse "generalization" than what I observe in language models).

emp17344 · 2025-11-14T20:26:01 1763151961

Maybe LLMs always fail to generalize outside their data set, and it’s just less noticeable with written language.

cluckindan · 2025-11-14T22:00:10 1763157610

This is it. They’re language models which predict next tokens probabilistically and a sampler picks one according to the desired ”temperature”. Any generalization outside their data set is an artifact of random sampling: happenstance and circumstance, not genuine substance.

cluckindan · 2025-11-15T13:57:42 1763215062

However: do humans have that genuine substance? Is human invention and ingenuity more than trial and error, more than adaptation and application of existing knowledge? Can humans generalize outside their data set?

A yes-answer here implies belief in some sort of gnostic method of knowledge acquisition. Certainly that comes with a high burden of proof!

sophrosyne42 · 2025-11-15T19:57:05 1763236625

Yes. Humans can perform abduction, extrapolating given information to new information. LLMs cannot, they can only interpolate new data based on existing data.

dawidloubser · 2025-11-15T15:14:29 1763219669

cluckindan · 2025-11-15T17:40:20 1763228420

Can you elaborate on what you mean by that, and prove it?

https://journals.sagepub.com/doi/10.1177/09637214251336212

Thorentis · 2025-11-16T08:35:47 1763282147

The proof is that humans do it all the time and that you do it inside your head as well. People need to stop with this absurd level of rampant skepticism that makes them doubt their own basic functions.

RugnirViking · 2025-11-16T19:29:08 1763321348

the concept is too nebulous to "prove" but the fact im operating a machine (relatively) skillfully to write to you shows we are in fact able to generalise. This wasn't planned, we came up with this. Same with cars etc. We're quite good at the whole "tool use" thing

phire · 2025-11-15T01:34:50 1763170490

Most image models are diffusion models, not LLMs, and have a bunch of other idiosyncrasies.

So I suspect it's more that lessons from diffusion image models don't carry over to text LLMs.

And the Image models which are based on multi-mode LLMs (like Nano Banana) seem to do a lot better at novel concepts.

Gormo · 2025-11-17T19:11:01 1763406661

But the clocks in this demo aren't images.

phire · 2025-11-18T00:25:44 1763425544

Yes, but they are reasoning within their dataset, which will contain multiple example of html+css clocks.

They are just struggling to produce good results because they are language models and don’t have great spatial reasoning skills, because they are language models.

Their output normally has all the elements, just not in the right place/shape/orientation.

IshKebab · 2025-11-14T22:58:06 1763161086

They definitely don't completely fail to generalise. You can easily prove that by asking them something completely novel.

Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.

Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?

Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.

immibis · 2025-11-15T15:40:05 1763221205

They used to do poorly with modified riddles, but I assume those have been added to their training data now (https://huggingface.co/datasets/marcodsn/altered-riddles ?)

Like, the response to "... The surgeon (who is male and is the boy's father) says: I can't operate on this boy! He's my son! How is this possible?" used to be "The surgeon is the boy's mother"

The response to "... At each door is a guard, each of which always lies. What question should I ask to decide which door to choose?" would be an explanation of how asking the guard what the other guard would say would tell you the opposite of which door you should go through.

CobrastanJorji · 2025-11-14T20:26:55 1763152015

Also, they're fundamentally bad at math. They can draw a clock because they've seen clocks, but going further requires some calculations they can't do.

For example, try asking Nano Banana to do something simpler, like "draw a picture of 13 circles." It likely will not work.

deathanatos · 2025-11-14T19:45:16 1763149516

  Generate an image of a clock face, but instead of the usual 12 hour numbering, number it with 13 hours.

Gemini, 2.5 Flash or "Nano Banana" or whatever we're calling it these days. https://imgur.com/a/1sSeFX7

A normal (ish) 12h clock. It numbered it twice, in two concentric rings. The outer ring is normal, but the inner ring numbers the 4th hour as "IIII" (fine, and a thing that clocks do) and the 8th hour as "VIIII" (wtf).

bar000n · 2025-11-14T19:56:41 1763150201

It should be pretty clear already that anything which is based (limited?) to communicating words/text can never grasp conceptual thinking.

We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.

bayindirh · 2025-11-14T20:04:16 1763150656

> We have yet to design a language to cover that, and it might be just a donquijotism we're all diving into.

We have a very comprehensive and precise spec for that [0].

If you don't want to hop through the certificate warning, here's the transcript:

- Some day, we won't even need coders any more. We'll be able to just write the specification and the program will write itself.

- Oh wow, you're right! We'll be able to write a comprehensive and precise spec and bam, we won't need programmers any more.

- Exactly

- And do you know the industry term for a project specification that is comprehensive and precise enough to generate a program?

- Uh... no...

- Code, it's called code.

[0]: https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...

snickerbockers · 2025-11-14T20:18:07 1763151487

Ive been thinking about that a lot too. Fundamentally it's just a different way of telling the computer what to do and if it seems like telling an llm to make a program is less work than writing it yourself then either your program is extremely trivial or there are dozens of redundant programs in the training set that are nearly identical.

If you're actualy doing real work you have nothing to fear from LLMs because any prompt which is specific enough to create a given computer program is going to be comparable in terms of complexity and effort to having done it yourself.

Uehreka · 2025-11-14T20:24:44 1763151884

I don’t think that’s clear at all. In fact the proficiency of LLMs at a wide variety of tasks would seem to indicate that language is a highly efficient encoding of human thought, much moreso than people used to think.

tsunamifury · 2025-11-15T01:39:34 1763170774

Yea it’s amazing that the parent post literally misunderstands the fundamental realities of LLMs and the compression they reveal in linguistics even if blurry is incredible.

XenophileJKO · 2025-11-14T21:40:08 1763156408

I mean, that's not really "true".

https://claude.ai/public/artifacts/0f1b67b7-020c-46e9-9536-c...

rideontime · 2025-11-14T19:59:23 1763150363

Really? I can grasp the concept behind that command just fine.

andix · 2025-11-14T23:37:05 1763163425

I gave this "riddle" to various models:

> The farmer and the goat are going to the river. They look into the sky and see three clouds shaped like: a wolf, a cabbage and a boat that can carry the farmer and one item. How can they safely cross the river?

Most of them are just giving the result to the well known river crossing riddle. Some "feel" that something is off, but still have a hard time to figure out that wolf, boat and cabbage are just clouds.

jampa · 2025-11-15T01:32:18 1763170338

There are few examples of this as well:

https://www.reddit.com/r/singularity/comments/1fqjaxy/contex...

andix · 2025-11-15T01:38:26 1763170706

It really shows how LLMs work. It's all about probabilities, and not about understanding. If something looks very similar to a well known problem, the llm is having a hard time to "see" contradictions. Even if it's really easy to notice for humans.

Recursing · 2025-11-15T14:10:23 1763215823

Claude has no problem with this: https://imgur.com/a/ifSNOVU

Maybe older models?

andix · 2025-11-15T14:30:10 1763217010

Try to twist around words and phrases, at some point it might start to fail.

I tried it again yesterday with GPT. GPT-5 manages quite well too in thinking mode, but starts crackling in instant mode. 4o completely failed.

It's not that LLMs are unable to solve things like that at all, but it's really easy to find some variations that make them struggle really hard.

userbinator · 2025-11-15T01:29:58 1763170198

Basically a variation of https://en.wikipedia.org/wiki/Age_of_the_captain

echelon · 2025-11-14T19:23:28 1763148208

That's just a patch to the training data.

Once companies see this starting to show up in the evals and criticisms, they'll go out of their way to fix it.

rideontime · 2025-11-14T20:00:36 1763150436

What would the "patch" be? Manually create some images of 13-hour clocks and add them to the training data? How does that solution scale?

godelski · 2025-11-14T20:09:28 1763150968

s/13/17/g ;)

BrandoElFollito · 2025-11-14T19:36:26 1763148986

This is really cool. I tried to prompt gemini but every time I got the same picture. I do not know how to share a session (like it is possible with Chatgpt) but the prompts were

If a clock had 13 hours, what would be the angle between two of these 13 hours?

Generate an image of such a clock

No, I want the clock to have 13 distinct hours, with the angle between them as you calculated above

This is the same image. There need to be 13 hour marks around the dial, evenly spaced

... And its last answer was

You are absolutely right, my apologies. It seems I made an error and generated the same image again. I will correct that immediately.

Here is an image of a clock face with 13 distinct hour marks, evenly spaced around the dial, reflecting the angle we calculated.

And the very same clock, with 12 hours, and a 13th above the 12...

ryandrake · 2025-11-14T19:46:13 1763149573

This is probably my biggest problem with AI tools, having played around with them more lately.

"You're absolutely right! I made a mistake. I have now comprehensively solved this problem. Here is the corrected output: [totally incorrect output]."

None of them ever seem to have the ability to say "I cannot seem to do this" or "I am uncertain if this is correct, confidence level 25%" The only time they will give up or refuse to do something is when they are deliberately programmed to censor for often dubious "AI safety" reasons. All other times, they come back again and again with extreme confidence as they totally produce garbage output.

BrandoElFollito · 2025-11-14T19:56:58 1763150218

I agree, I see the same even in simple code where they will bend backwards apologizing and generate very similar crap.

It is like they are sometimes stuck in a local energetic minimum and will just wobble around various similar (and incorrect) answers.

What was annoying in my attempt above is that the picture was identical for every attempt

ryandrake · 2025-11-14T20:09:02 1763150942

These tools 'attitude' reminds me of an eager, but incompetent intern or a poorly trained administrative assistant, who works for a powerful CEO. All sycophancy, confidence and positive energy, but not really getting much done.

SamBam · 2025-11-14T21:18:51 1763155131

The issue is the they always say "Here's the final, correct answer" before they've written the answer, so of course the LLM has no idea if it's going to be right before it starts, because it has no clue what it's going to say.

I wonder how it would do if instead it were told "Do not tell me at the start that the solution is going to be correct. Instead, tell me the solution, and at the end tell me if you think it's correct or not."

I have found that on certain logic puzzles that it simply cannot get right, it always tells me that it's going to get it quite "this last time," but if asked later it always recognizes its errors.

int_19h · 2025-11-15T00:28:20 1763166500

Gemini specifically is actually kinda notorious for giving up.

https://www.reddit.com/r/artificial/comments/1mp5mks/this_is...

notatoad · 2025-11-14T23:23:03 1763162583

you can click the share icon (the two-way branch icon, it doesn't look like apple's share icon) under the image it generates to share the conversation.

i'm curious if the clock image it was giving you was the same one it was giving me

https://gemini.google.com/share/780db71cfb73

BrandoElFollito · 2025-11-15T12:57:10 1763211430

Thanks for the tip about sharing!

No, my clock was an old style one, to be put on a shelf. But at least it had a "13" proudly right above the "12" :)

This reminds me my kids when they were in kindergarden and were bringing home their art that needed extra explanation to realize what it was. But they were very proud!

edub · 2025-11-15T07:37:01 1763192221

I was able to have AI generate an image that made this, but not by diffusion/autoregressive but by having it write Python code to create the image.

ChatGPT made a nice looking clock with matplotlib that had some bugs that it had to fix (hours were counter-clockwise). Gemini made correct code one-shot, it used Pillow instead of matplotlib, but it didn't look as nice.

giancarlostoro · 2025-11-14T20:00:46 1763150446

Weird, I never tried that, I tried all the usual tricks that usually work including swearing at the model (this scarily works surprisingly well with LLMs) and nothing. I even tried to go the opposite direction, I want a 6 hour clock.

nl · 2025-11-15T08:33:23 1763195603

I do playing card generation and almost all struggle beyond the "6 of X"

My working theory is that they were trained really hard to generate 5 fingers on hands but their counting drops off quickly.

IAmGraydon · 2025-11-14T19:20:29 1763148029

That's because they literally cannot do that. Doing what you're asking requires an understanding of why the numbers on the clock face are where they are and what it would mean if there was an extra hour on the clock (ie that you would have to divide 360 by 13 to begin to understand where the numbers would go). AI models have no concept of anything that's not included in their training data. Yet people continue to anthropomorphize this technology and are surprised when it becomes obvious that it's not actually thinking.

energy123 · 2025-11-14T19:46:18 1763149578

The hope was for this understanding to emerge as the most efficient solution to the next-token prediction problem.

Put another way, it was hoped that once the dataset got rich enough, developing this understanding is actually more efficient for the neural network than memorizing the training data.

The useful question to ask, if you believe the hope is not bearing fruit, is why. Point specifically to the absent data or the flawed assumption being made.

Or more realistically, put in the creative and difficult research work required to discover the answer to that question.

bobbylarrybobby · 2025-11-14T19:24:54 1763148294

It's interesting because if you asked them to write code to generate an SVG of a clock, they'd probably use a loop from 1 to 12, using sin and cos of the angle (given by the loop index over 12 times 2pi) to place the numerals. They know how to do this, and so they basically understand the process that generates a clock face. And extrapolating from that to 13 hours is trivial (for a human). So the fact that they can't do this extrapolation on their own is very odd.

echelon · 2025-11-14T19:25:20 1763148320

gpt-image-1 and Google Imagen understand prompts, they just don't have training data to cover these use cases.

gpt-image-1 and Imagen are wickedly smart.

The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.

phkahler · 2025-11-14T19:57:30 1763150250

>> The new Nano Banana 2 that has been briefly teased around the internet can solve incredibly complicated differential equations on chalk boards with full proof of work.

That's great, but I bet it can't tie it's own shoes.

echelon · 2025-11-14T21:37:47 1763156267

No, but I can get it to do a lot of work.

It's a part of my daily tool box.

esafak · 2025-11-14T21:29:31 1763155771

And a submarine can't swim. Big deal.

ryandrake · 2025-11-14T19:49:29 1763149769

I wonder if you would have more success if you painstakingly described the shape and features of a clock in great detail but never used the words clock or time or anything that might give the AI the hint that they were supposed to output something like a clock.

BrandoElFollito · 2025-11-14T20:00:54 1763150454

And this is a problem for me. I guess that it would work, but as soon as the word "clock" appears, gone is the request because a clock HAS.12.HOURS.

I use this a lot in cybersecurity when I need to do something "illegal". I am refused help, until I say that I am doing research on cybersecurity. In that case no problem.

Workaccount2 · 2025-11-14T20:08:08 1763150888

The problem is more likely the tokenization of images than anything. These models do their absolute worst when pictures are involved, but are seemingly miraculous at generalizing with just text.

chemotaxis · 2025-11-14T20:50:11 1763153411

I wonder if it's because we mean different things by generalization.

For text, "generalization" is still "generate text that conforms to all the usual rules of the language". For images of 13-hour clock faces, we're explicitly asking the LLM to violate the inferred rules of the universe.

I think a good analogy would be asking an LLM to write in English, except the word "the" now means "purple". They will struggle to adhere to this prompt in a conversation.

Workaccount2 · 2025-11-15T00:50:39 1763167839

That's true, but I think humans would stumble a lot too (try reading old printed text from the 18fh cenfury where fhey used "f" insfead of t in prinf, if's a real frick fo gef frough).

However humans are pretty adept at discerning images, even ones outside the norm. I really think there is some kind of architectural block hampering transformers ability to really "see" images. For instance if you show any model a picture of a dog with 5 legs (a fifth leg photoshopped to it's belly) they all say there are only 4 legs. And will argue with you about it. Hell GPT-5 even wrote a leg detection script in python (impressive) which detected the 5 legs, and then it said the script was bugged, and modified the parameters until one of the legs wasn't detected, lol.

onraglanroad · 2025-11-15T06:41:53 1763188913

An "f" never replaced a "t".

You probably mean the "long s" that looks like an "f".

godelski · 2025-11-14T20:10:30 1763151030

Yes, the problem is that these so called "world models" do not actually contain a model of the world, or any world

chanux · 2025-11-15T01:29:26 1763170166

Ah! This is so sad. The manager types won't be able to add an hour (actually, two) to the day even with AI.

snek_case · 2025-11-14T19:12:32 1763147552

From my experience they quickly fail to understand anything beyond a superficial description of the image you want.

atorodius · 2025-11-14T19:35:21 1763148921

That's less and less true

https://minimaxir.com/2025/11/nano-banana-prompts/

dang · 2025-11-14T19:36:13 1763148973

Related ongoing thread:

Nano Banana can be prompt engineered for nuanced AI image generation - https://news.ycombinator.com/item?id=45917875 - Nov 2025 (214 comments)

usui · 2025-11-14T20:36:51 1763152611

I've been trying for the longest time and across models to generate pictures or cartoons of people with six fingers and now they won't do it. They always say they accomplished it, but the result always has 5 fingers. I hate being gaslit.

coffeecoders · 2025-11-14T19:28:54 1763148534

LLMs are terrible for out-of-distribution (OOD) tasks. You should use chain of thought suppression and give constaints explictly.

My prompt to Grok:

---

Follow these rules exactly:

- There are 13 hours, labeled 1–13.

- There are 13 ticks.

- The center of each number is at angle: index * (360/13)

- Do not infer anything else.

- Do not apply knowledge of normal clocks.

Use the following variables:

HOUR_COUNT = 13

ANGLE_PER_HOUR = 360 / 13 // 27.692307°

Use index i ∈ [0..12] for hour marks:

angle_i = i * ANGLE_PER_HOUR

I want html/css (single file) of a 13-hour analog clock.

---

Output from grok.

https://jsfiddle.net/y9zukcnx/1/

chemotaxis · 2025-11-14T19:45:46 1763149546

> Follow these rules exactly:

"Here's the line-by-line specification of the program I need you to write. Write that program."

signatoremo · 2025-11-14T20:26:35 1763151995

Can you write this program in any language?

bigfishrunning · 2025-11-14T21:38:52 1763156332

chemotaxis · 2025-11-14T20:44:22 1763153062

No, do I need to?

serf · 2025-11-15T10:14:22 1763201662

it's lazy to dust off the major advantages of a pseudocode-to-anylanguage transpiler as if it's somehow easy or commonplace.

BrandoElFollito · 2025-11-14T19:39:43 1763149183

Well, that's cheating :) You asked it to generate code, which is ok because it does not represent a direct generated image of a clock.

Can grok generate images? What would the result be?

I will try your prompt on chatgpt and gemini

BrandoElFollito · 2025-11-14T19:45:21 1763149521

Gemini failed miserably - a standard 12 hours clock

Same for chatgpt

And perplexity replaced 12 with 13

dwringer · 2025-11-14T20:05:48 1763150748

> Please create a highly unusual 13-hour analog clock widget, synchronized to system time, with fully animated hands that move in real time, and not 12 but 13 hour markings - each will be spaced at not 5-minute intervals, but at 4-minute-37-second intervals. This makes room for all 13 hour markings. Please pay attention to the correct alignment of the 13 numbers and the 13 hour marks, as well as the alignment of the hands on the face.

This gave me a correct clock face on Gemini- after the model spent a lot of time thinking (and kind of thrashing in a loop for a while). The functionality isn't quite right, not that it entirely makes sense in the first place, but the face - at least in terms of the hour marks - looks OK to me.[0]

[0] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

chiwilliams · 2025-11-14T19:52:56 1763149976

I'll also note that the output isn't quite right --- the top number should be 13 rather than 1!

layer8 · 2025-11-14T20:15:40 1763151340

I mean, the specification for the hour marks (angle_i) starts with a mark at angle 0. It just followed that spec. ;)

NooneAtAll3 · 2025-11-14T20:44:03 1763153043

close enough, but digit at the top should be the highest, not 1 :/