I've noticed this with GPT as well -- the first result I get is usually mediocre...

ThrowawayR2 · 2025-01-03T16:15:55 1735920955

We've entered the voodoo witch doctor phase of LLM usage: "Enter thee this arcane incantation along with thy question into the idol and, lo, the ineffable machine spirits wilt be appeased and deign to grant thee the information thou hast asked for."

becquerel · 2025-01-03T16:37:15 1735922235

This has been part of LLM usage since day 1, and I say that as an ardent fan of the tech. Let's not forget how much ink has been spilled over that fact that "think through this step by step" measurably improved/improves performance.

RansomStark · 2025-01-03T19:41:22 1735933282

> "think through this step by step"

Has always made sense to me, if you think how these models were trained.

My experience with great stackoverflow responses and detailed blog posts, they often contain "think through this step by step" or something very similar.

Intuitively adding that phrase should help the model narrow down the response content / formatting

zahlman · 2025-01-04T13:04:42 1735995882

Then why don't they hard-code the interface to the model to pretend you included that in the prompt?

Phlogistique · 2025-01-04T14:53:24 1736002404

...they do. It's called "o1".

manishader · 2025-01-04T11:43:22 1735991002

It is because the chance of the right answer goes down exponentially as the complexity of what is being asked goes up.

Asking a simpler question is not voodoo.

On the other hand, I think many people are trying various rain dances and believing it was a specific dance that was the cause when it happened to rain.

jmathai · 2025-01-03T17:13:54 1735924434

We use the approaching of feeding mistakes from LLM generated code back to the LLM until it produces working code [1].

I might have to try some more aggressive prompting :).

[1] https://withlattice.com

bigmutant · 2025-01-03T17:53:49 1735926829

The Tech-Priests of Mars are calling

ThrowawayTestr · 2025-01-03T17:49:06 1735926546

Praise the Omnissiah

pwdisswordfishz · 2025-01-03T15:32:36 1735918356

I suspect that all it does is prime it to reach for the part of the training set that was sourced from rude people who are less tolerant of beginners and beginners' mistakes – and therefore less likely to commit them.

kridsdale1 · 2025-01-03T17:53:59 1735926839

The Linus Manifold

asddubs · 2025-01-04T05:09:30 1735967370

I don't know if I would say that there's a correlation between being rude/arrogant and competent

Syzygies · 2025-01-03T16:02:31 1735920151

I've stopped expressions of outrage at lazy first answers, after seeing some sort of "code of conduct" warning.

Apparently, the singularity ship has sailed, but we really don't want AI to remember us as the species that cursed abuse at it when it was a puppy.

r_singh · 2025-01-03T16:24:05 1735921445

I feel like rule for code of conduct with humans and AI is the same. Try to be good but have the courage to be disliked. If being mean is making me feel good, I'm definitely wrong.

bilbo0s · 2025-01-03T14:51:58 1735915918

"If you get this wrong, OpenAI will be sued for a lot of money, and all the board members will go to prison"

This didn't work. At least not on my task. What model were you using?

EGreg · 2025-01-03T15:06:24 1735916784

wait til 2027, you’ll see the result :-P

DiggyJohnson · 2025-01-03T15:20:38 1735917638

Godspeed Greg

RobotToaster · 2025-01-03T14:52:01 1735915921

IIRC there was a post on here a while ago about how LLMs give better results if you threaten them or tell them someone is threatening you (that you'll lose your job or die if it's wrong for instance)

__mharrison__ · 2025-01-03T16:25:05 1735921505

The author of that post wrote this post and links to it in this article.

lesuorac · 2025-01-03T19:28:28 1735932508

If they really care about the answer, they'll ask a second time sounds a lot like if your medical claims are real then you'll appeal.

draculero · 2025-01-03T17:16:38 1735924598

I tried to update some files using Claude. I tried to use a combination of positive and negative reinforcement, telling that I was going to earn a coin for each file converted and I was going to use that money to adopt a stray kitten, but for every unsuccessful file, a poor kitten was going to suffer a lot.

I had the impression that it got a little better. After every file converted, it said something along the lines of “Great! We saved another kitten!" It was hilarious.

empath75 · 2025-01-03T17:17:09 1735924629

> I've taken to asking it to "skip the mediocre nonsense and return the good solution on the first try".

I think having the mediocre first pass in the context is probably essential to it creating the improved version. I don't think you can really skip the iteration process and get a good result.

dotancohen · 2025-01-03T15:25:05 1735917905

  > I've taken to asking it to "skip the mediocre nonsense and return the good solution on the first try".

Is that actually how you're prompting it? Does that actually give better results?

menacingly · 2025-01-03T18:48:47 1735930127

stuff like this working is why you get odd situations like "don't hallucinate" actually producing fewer hallucinations. it's to me one of the most interesting things about llms

TheKarateKid · 2025-01-04T19:43:40 1736019820

I've just encountered this happening today, except instead of something complex like coding, it was editing a simple Word document. I gave it about 3 criteria to perform.

Each time, the GPT made trivial mistakes that clearly didn't fit the criteria I asked it to do. Each time I pointed it out and corrected it, it did a bit more of what I wanted it to do.

Point is, it knew what had to be done the entire time and just refused to do it that way for whatever reason.

strongpigeon · 2025-01-03T17:24:24 1735925064

What has been your experience with using ChatGPT for OpenSCAD? I tried it (o1) recently for a project and it was pretty bad. I was trying to model a 2 color candy cane and the code it would give me was ridden with errors (e.g.: using radians for angles while OpenSCAD uses degrees) and the shape it produced looked nothing like what I had hoped.

I used it in another project to solve some trigonometry problems for me and it did great, but for OpenSCAD, damn it was awful.

btbuildem · 2025-01-03T19:52:12 1735933932

It's been pretty underwhelming. My use case was a crowned pulley with 1mm tooth pitch (GT2) which is an unusual enough thing that I could not find one online.

The LLM kept going in circles between two incorrect solutions, then just repeating the same broken solution while describing it as different. I ended up manually writing the code, which was a nice brain-stretch given that I'm an absolute noob at OpenSCAD.

petee · 2025-01-04T03:35:36 1735961736

I've found just being friendly, but highly critical and suspicious, gets good results.

If you can get it to be wordy about "why" a specific part of the answer was given, it often reveals what its stumbling on, then modify your prompt accordingly.

danjl · 2025-01-03T18:07:34 1735927654

It is best to genuflect to our future overlords. They may not forget insolence.

maxlamb · 2025-01-04T03:58:30 1735963110

So you’re telling me this clip isn’t even satire: https://youtube.com/shorts/64TNGvCoegE

PhunkyPhil · 2025-01-03T16:10:27 1735920627

Anecdotally, negative sentiment definitely works. I've used f"If you don't do {x} then very very bad things will happen" before with some good results.