Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If this were true, then engineered prompts would fail for held-out problem instances. But they don’t.


I don't understand what you mean by that. Which held-out problem instances?


Suppose you engineer a prompt to make GPT3 do arithmetic. You design the prompt to work for a particular set of training examples like 1+1 and 2+3. If all the computation is in the prompt engineering, and GPT3 is just Clever Hans, then this engineered prompt should do no better than chance if you then hand it new instances like 4+5 with the same prompt.


>> Suppose you engineer a prompt to make GPT3 do arithmetic

Oh, I think I see what you mean. Thank you for clarifying. So, no, I didn't mean that the prompt is engineered to make it look like the model is performing a calculation. I meant that GPT-3 has memorised instances of arithmetic operations and in order to retrieve them from its memory the human user must figure out the right prompt. I wrote "that's you computing the prompt", not "that's you computing the result".

The prompt is like a SQL query, right? If you don't enter the right query, you don't get the right results. That's the point of all those people on the internets fiddling with their prompts- it's like they're trying to query a database, but they don't know what the right syntax is for their query, so they tweak it until it returns the results they want.

For example, the OP mentioned thousands separators being very helpful to the model. That's because it's memorised more arithmetic results with thousands separators, than without. So you're more likely to get the right results out of it if you use thousands separators.

Also because like the OP says GPT-3 has a separate concept for a digit and a string of digits and a separate one again for a string of digits and other symbols. "9999" is, in its model, a different thing than "9,999".

Which, btw, is why it can't calculate. Because to calculate, a system must have a representation of the concept of a number. Otherwise, calculate- with what?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: