Both models have improved intelligence on Artificial Analysis index with lower end-to-end response time. Also 24% to 50% improved output token efficiency (resulting in lower cost).
Gemini 2.5 Flash-Lite improvements include better instruction following, reduced verbosity, stronger multimodal & translation capabilities. Gemini 2.5 Flash improvements include better agentic tool use and more token-efficient reasoning.
Model strings: gemini-2.5-flash-lite-preview-09-2025 and gemini-2.5-flash-preview-09-2025
2.5 Flash is the first time I've felt AI has become truly useful to me. I was #1 AI hater but now find myself going to the Gemini app instead of Google search. It's just better in every way and no ads. The info it provides is usually always right and it feels like I have the whole generalized and accurate knowledge of the internet at my fingertips in the app. It's more intimate, less distractions. Just me and the Gemini app alone talking about kale's ideal germination temperature, instead of a bunch of mommy bloggers, bots, and SEO spam.
Now how long can Google keep this going and cannibalizing how they make money is another question...
It's also excellent for subjective NLP-type analysis. For example, I use it for "scouting" chapters in my translation pipeline to compile coherent glossaries that I can feed into prompts for per-chapter translation.
This involves having it identify all potential keywords and distinct entities, determine their approximate gender (important for languages with ambiguous gender pronouns), and then perform a line-by-line analysis of each chapter. For each line, it identifies the speaking entity, determines whose POV the line represents, and identifies the subject entity. While I didn't need or expect perfection, Gemini Flash 2.5 was the only model I tested that could not only follow all these instructions, but follow them well. The cheap price was a bonus.
I was thoroughly impressed, it's now my go-to for any JSON-formatted analysis reports.
Any idea what "output token efficiency" refers to?
Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?
Both models have improved intelligence on Artificial Analysis index with lower end-to-end response time. Also 24% to 50% improved output token efficiency (resulting in lower cost).
Gemini 2.5 Flash-Lite improvements include better instruction following, reduced verbosity, stronger multimodal & translation capabilities. Gemini 2.5 Flash improvements include better agentic tool use and more token-efficient reasoning.
Model strings: gemini-2.5-flash-lite-preview-09-2025 and gemini-2.5-flash-preview-09-2025