Regular coding questions mostly. For me o1 generally gives better code and understands the prompt more completely (haven’t started using r1 or o3 regularly enough to opine).
agreed but some might read your comment implying otherwise (there's no world in which you would have 'started using o3 regularly enough to opine'), as i did - given that you list it side to side with an available model.
We've been seeing success using it for LLM-as-a-judge tasks.
We set up an evaluation criteria and used o1 to evaluate the quality of the prod model, where the outputs are subjective, like creative writing or explaining code.
It's also useful for developing really good few-shot examples. We'll get o1 to generate multiple examples in different styles, then we'll have humans go through and pick the ones they like best, which we use as few-shot examples for the cheaper, faster prod model.
Finally, for some study I'm doing, I'll use it to grade my assignments before I hand them in. If I get a 7/10 from o1, I'll ask it to suggest the minimal changes I could make to take it to 10/10. Then, I'll make the changes and get it to regrade the paper.
I used R1 to write debug statements for Rust code, close to 50 pages in total. It is absolutely crushing it. The best debug statements i have ever seen, better than gpt for sure.
In my experience GPT is still the number one for code, but Deepseek is not that far away. I haven't used it much for the moment, but after a thousand coding queries i hope to have a much better picture of it's coding abilities. Really curious about that, but GPT is hard to beat.