Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I learned that lesson from GPT-5, where the preview was weeks long and the models kept changing during that period.

This Claude preview lasted from Friday to Monday so I was less worried about major model changes. I made sure to run the pelican benchmark against the model after 10am on Monday (the official release date) just to be safe.

The only thing I published that I ran against the preview model was the Claude code interpreter example.

I continue not to worry about models having been trained to ace my pelican benchmark, because the models still suck at it. You really think Anthropic deliberately cheated on my benchmark and still only managed to produce this? https://static.simonwillison.net/static/2025/claude-sonnet-4...





Testing this, its way more aggressive on throttle back than previous model, and message token lengths. Constantly stops in the middle of an action if its not a simple request. I presume you did not have resource limitations during the preview?

No, the preview was effectively unlimited usage (for two days).

Well, if they produced a really really really good image for pelicans on bicycles and nothing else, then their cheating would be obvious, so it makes sense to cheat just a little bit, across the board (if we want to assume they're cheating).

Yesterday someone posted an example of the same prompt but changing it to a human and it was basically trash, the example you've posted actually looks good all things considered. So yeah I do think its something they train on, same way they train on things in the benchmarks.

The easy way to tell is to try it yourself - run "Generate an SVG of a pelican riding a bicycle" and then try "Generate an SVG of an otter riding a skateboard" and see if the quality of the images seems similar.

How about a narwhal spacewalking from the ISS, with Earth visible below (specifically the Niger delta)?

https://claude.ai/public/artifacts/f3860a8a-2c7d-404f-978b-e...

Requesting an ‘extravagantly detailed’ version is quite impressive in the effort, if not quite the execution:

https://claude.ai/public/artifacts/f969805a-2635-4e30-8278-4...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: