Would love to see the system/user prompts involved, if possible.
Personally I get it to write the same code I'd produce, which obviously I think is OK code, but seems other's experience differs a lot from my own so curious to understand why. I've iterated a lot on my system prompt so could be as easy as that.
The biggest reason I avoid Gemini (and all of Google's models I've tried) is because I cannot get them to produce the same code I'd produce myself, while with OpenAI's models it's fairly trivial.
There is something deeper in the model that seemingly can be steered/programmed with the system/user prompts and it still produces kind of shitty code for some reason. Or I just haven't found the right way of prompting Google's stuff, could also be the reason, but seemingly the same approach works for OpenAI, Anthropic and others, not sure what to make of it.
I'm having the same issue with Gemini as soon as the context length exceeds 50k-ish. At that point, it starts to blurp out random code of terrible quality, even with clear instructions. It would often mix up various APIs. I spend a lot of time instructing it about not writing such code, with plenty of fewshot examples, but it doesn't seem to work. It's like it gets "confused".
The large context length is a huge advantage, but it doesn't seem to be able to use it effectively. Would you say that OpenAI models don't suffer from this problem?
Yes, definitely. For every model I've used and/or tested, the more context there is, the worse the output, even within the context limits.
When I use chat UIs (which admittedly is less and less), I never let the chat go beyond one of my messages and one response from the LLM. If something is wrong with the response, I figure out what I need to change with my prompt and start new chat/edit the first message and retry, until it works. Any time I've tried to "No, what I meant was ..." or "Great, now change ..." the responses drop sharply in quality.
Do you use the DeepSeek hosted R1, or a custom one?
The published model has a note strongly recommending that you should not use system prompts at all, and that all instructions should be sent as user messages, so I'm just curious about whether you use system prompts and what your experience with them is.
Maybe the hosted service rewrites them into user ones transparently ...
> Do you use the DeepSeek hosted R1, or a custom one?
Mainly the hosted one.
> The published model has a note strongly recommending that you should not use system prompts at all
I think that's outdated, the new release (deepseek-ai/DeepSeek-R1-0528) has the following in the README:
> Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 have the following changes: System prompt is supported now.
The previous ones, while they said to put everything in user prompts, still seemed steerable/programmable via the system prompt regardless, but maybe it wasn't as effective as it is for other models.
But yeah outside of that, heavy use of system (and obviously user) prompts.
Personally I get it to write the same code I'd produce, which obviously I think is OK code, but seems other's experience differs a lot from my own so curious to understand why. I've iterated a lot on my system prompt so could be as easy as that.