Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience Gemini 2.5 Pro is the star when it comes to complex codebases. Give it a single xml from repomix and make sure to use the one at the aistudio.




In my experience, G2.5P can handle so much more context and giving an awesome execution plan that is implemented by CC so much better than anything G2.5P will come up with. So; I give G2.5P the relevant code and data underneath and ask it to develop an execution plan and then I feed that result to CC to do the actual code writing.

This has been outstanding for what I have been developing AI assisted as of late.


I would believe this. In regular conversational use with the Gemini family of models, I've noticed they regularly have issues with context blending.. i.e. confusing what you said and they said and causality.

I would think this would manifest as poor plan execution. I personally haven't used Gemini on coding tasks primarily based on my conversational experience with them.


+1 but recently been experimenting with gpt-5–high for the plan part and it’s scary good sometimes.

Gemini 2.5 Pro = Long context king, image input king

GPT-5 = Overengineering/complexity/"enterprise" king

Claude = "Get straightforwaed shit done efficiently" king


On the plus side, GPT5 is very malleable, so you CAN prompt it away from that, whereas it's very hard to prompt Claude into producing hard code: even with a nearly file by file breakdown of a task, it'll occasionally run into an obstacle and just give up and make a mock or top implementation, basically diverge from the entire plan, then do its own version.

Absolutely, sometimes you want, or indeed need such complexity. Some work in settings where they would want it all of the time. IMHO, most people, most of the time don't really want it, and don't want to have to prompt it every time to avoid it. That's why I think it's still very useful to build up experience with the three frontier models, so you can choose according to the situation.

I think a lot of it has to do with the super long context that it has. For extended sessions and/or large codebases that can fill up surprisingly quickly.

That said, one thing I do dislike about Gemini is how fond it is of second guessing the user. This usually manifests in doing small unrelated "cleaner code" changes as part of a larger task, but I've seen cases where the model literally had something like "the user very clearly told me to do X, but there's no way that's right - they must have meant Y instead and probably just mistakenly said X; I'll do Y now".

One specific area where this happens a lot is, ironically, when you use Gemini to code an app that uses Gemini APIs. For Python, at least, they have the legacy google-generativeai API, and the new google-genai API, which have fairly significant differences between them even though the core functionality is the same. The problem is that Gemini knows the former much better than the latter, and when confronted with such a codebase, will often try to use the old API (even if you pre-write the imports and some examples!). Which then of course breaks the type checker, so then Gemini sees this and 90% of the time goes, "oh, it must be failing because the user made an error in that import - I know it's supposed to be "generativeai" not "genai" so let me correct that.


Yup. In fact every deep research tool on the market is just a wrapper for gemini, their "secret sauce" is just how they partition/pack the codebase to feed it into gemini.

Its mostly because it is so damn good with long contexts. It can stay on the ball even at 150k whereas other models really wilt around 50-75k.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: