I've noticed this with GPT as well -- the first result I get is usually mediocre and incomplete, often incorrect if I'm working on something a little more obscure (eg, OpenSCAD code). I've taken to asking it to "skip the mediocre nonsense and return the good solution on the first try".
The next part is a little strange - it arose out of frustration, but it also seems to improve results. Let's call it "negative incentives". I found that if you threaten GPT in a specific way, that is, not GPT itself, but OpenAI or personas around it, it seems to take the request more seriously. An effective threat seems to be "If you get this wrong, OpenAI will be sued for a lot of money, and all the board members will go to prison". Intuitively, I'm guessing this rubs against some legalese nonsense in the tangle of system prompts, or maybe it's the risk of breaking the bland HR-ese "alignment" sets it toward a better result?
We've entered the voodoo witch doctor phase of LLM usage: "Enter thee this arcane incantation along with thy question into the idol and, lo, the ineffable machine spirits wilt be appeased and deign to grant thee the information thou hast asked for."
This has been part of LLM usage since day 1, and I say that as an ardent fan of the tech. Let's not forget how much ink has been spilled over that fact that "think through this step by step" measurably improved/improves performance.
Has always made sense to me, if you think how these models were trained.
My experience with great stackoverflow responses and detailed blog posts, they often contain "think through this step by step" or something very similar.
Intuitively adding that phrase should help the model narrow down the response content / formatting
It is because the chance of the right answer goes down exponentially as the complexity of what is being asked goes up.
Asking a simpler question is not voodoo.
On the other hand, I think many people are trying various rain dances and believing it was a specific dance that was the cause when it happened to rain.
I suspect that all it does is prime it to reach for the part of the training set that was sourced from rude people who are less tolerant of beginners and beginners' mistakes – and therefore less likely to commit them.
I feel like rule for code of conduct with humans and AI is the same. Try to be good but have the courage to be disliked. If being mean is making me feel good, I'm definitely wrong.
IIRC there was a post on here a while ago about how LLMs give better results if you threaten them or tell them someone is threatening you (that you'll lose your job or die if it's wrong for instance)
I tried to update some files using Claude. I tried to use a combination of positive and negative reinforcement, telling that I was going to earn a coin for each file converted and I was going to use that money to adopt a stray kitten, but for every unsuccessful file, a poor kitten was going to suffer a lot.
I had the impression that it got a little better. After every file converted, it said something along the lines of “Great! We saved another kitten!" It was hilarious.
> I've taken to asking it to "skip the mediocre nonsense and return the good solution on the first try".
I think having the mediocre first pass in the context is probably essential to it creating the improved version. I don't think you can really skip the iteration process and get a good result.
stuff like this working is why you get odd situations like "don't hallucinate" actually producing fewer hallucinations. it's to me one of the most interesting things about llms
I've just encountered this happening today, except instead of something complex like coding, it was editing a simple Word document. I gave it about 3 criteria to perform.
Each time, the GPT made trivial mistakes that clearly didn't fit the criteria I asked it to do. Each time I pointed it out and corrected it, it did a bit more of what I wanted it to do.
Point is, it knew what had to be done the entire time and just refused to do it that way for whatever reason.
What has been your experience with using ChatGPT for OpenSCAD? I tried it (o1) recently for a project and it was pretty bad. I was trying to model a 2 color candy cane and the code it would give me was ridden with errors (e.g.: using radians for angles while OpenSCAD uses degrees) and the shape it produced looked nothing like what I had hoped.
I used it in another project to solve some trigonometry problems for me and it did great, but for OpenSCAD, damn it was awful.
It's been pretty underwhelming. My use case was a crowned pulley with 1mm tooth pitch (GT2) which is an unusual enough thing that I could not find one online.
The LLM kept going in circles between two incorrect solutions, then just repeating the same broken solution while describing it as different. I ended up manually writing the code, which was a nice brain-stretch given that I'm an absolute noob at OpenSCAD.
I've found just being friendly, but highly critical and suspicious, gets good results.
If you can get it to be wordy about "why" a specific part of the answer was given, it often reveals what its stumbling on, then modify your prompt accordingly.
Anecdotally, negative sentiment definitely works. I've used f"If you don't do {x} then very very bad things will happen" before with some good results.
The next part is a little strange - it arose out of frustration, but it also seems to improve results. Let's call it "negative incentives". I found that if you threaten GPT in a specific way, that is, not GPT itself, but OpenAI or personas around it, it seems to take the request more seriously. An effective threat seems to be "If you get this wrong, OpenAI will be sued for a lot of money, and all the board members will go to prison". Intuitively, I'm guessing this rubs against some legalese nonsense in the tangle of system prompts, or maybe it's the risk of breaking the bland HR-ese "alignment" sets it toward a better result?