Non-AI Summary: Both models have improved intelligence on Artificial Analysis in...

Mistletoe · 2025-09-25T17:56:37 1758822997

2.5 Flash is the first time I've felt AI has become truly useful to me. I was #1 AI hater but now find myself going to the Gemini app instead of Google search. It's just better in every way and no ads. The info it provides is usually always right and it feels like I have the whole generalized and accurate knowledge of the internet at my fingertips in the app. It's more intimate, less distractions. Just me and the Gemini app alone talking about kale's ideal germination temperature, instead of a bunch of mommy bloggers, bots, and SEO spam.

Now how long can Google keep this going and cannibalizing how they make money is another question...

yesco · 2025-09-25T18:58:06 1758826686

It's also excellent for subjective NLP-type analysis. For example, I use it for "scouting" chapters in my translation pipeline to compile coherent glossaries that I can feed into prompts for per-chapter translation.

This involves having it identify all potential keywords and distinct entities, determine their approximate gender (important for languages with ambiguous gender pronouns), and then perform a line-by-line analysis of each chapter. For each line, it identifies the speaking entity, determines whose POV the line represents, and identifies the subject entity. While I didn't need or expect perfection, Gemini Flash 2.5 was the only model I tested that could not only follow all these instructions, but follow them well. The cheap price was a bonus.

I was thoroughly impressed, it's now my go-to for any JSON-formatted analysis reports.

indigodaddy · 2025-09-25T19:19:21 1758827961

Google AI mode is excellent as well, which I guess is just Gemini 2.5 Flash I'd imagine as well?

kridsdale1 · 2025-09-25T20:41:10 1758832870

If you have access, try AI Mode on Google.com. It’s a different product from Gemini that tries to solve “search engine data presented in LLM format”.

Disclaimer: I recently joined this team. But I like the product!

jonplackett · 2025-09-25T18:17:19 1758824239

I think “Non-AI summary” is going to become a thing. I already enjoyed reading it more because I knew someone had thought about the content.

paxys · 2025-09-25T21:09:56 1758834596

As soon as it becomes a thing LLMs will start putting "Non-AI summary" at the top of their responses.

nharada · 2025-09-25T20:28:40 1758832120

I'm stealing "Non-AI Summary"

crishoj · 2025-09-25T18:27:48 1758824868

Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?

Romario77 · 2025-09-25T20:03:55 1758830635

They provide the answer in less words (while still conveying what needed to be said).

Which is a good thing in my book as the models now are way too verbose (and I suspect one of the reasons is the billing by tokens).

minimaxir · 2025-09-25T18:36:01 1758825361

The post implies that the new model are better at thinking, therefore less time/cost spent overall.

The first chart implies the gains are minimal for nonthinking models.

kaspermarstal · 2025-09-25T20:03:23 1758830603

Models are less verbose, so produces fewer output tokens, so answers cost less.

jama211 · 2025-09-25T19:06:06 1758827166

Thank you for this, seems like an iterative improvement.