Hacker News new | past | comments | ask | show | jobs | submit login

Are you suggesting that it was commonplace for machine learning models to be able to extrapolate medical case reports into an actual diagnosis? Even being able to read and understand a case report is a minor miracle in extrapolation, and it’s interesting how far the goal posts move.



I have another example of ChatGPT not generalizing but just being a really good statistical model. I needed a solution to a problem that you can't find on Google, and a variation of a problem that also can't be found on Google. I attempted to obtain around 15 lines of code from ChatGPT that would solve the problem, but it consistently failed to produce the correct solution. I spent a few hours trying different prompts, indicating its errors and receiving apologies, only for it to generate another incorrect solution while acknowledging its mistake. Solving out of distribution problems correctly seems almost impossible for it.


GPT 3.5 or 4? Surprisingly it makes a huge difference. I think a lot of peoples’ impressions are with 3.5, but many startups couldn’t have been built on it, whereas with 4 they can.

If it was 4 I’d be curious about the specific problem if you’d be willing to link to the chat.


I had similar issue with gpt4, was looking for a library that given a grammar and a string, would produce a list of next valid symbols.

Gpt4 only ever suggested grammar validator to the point I had given up and was going to write a grammar generator, and so I started looking for the equivalent of antlr in python, and in three searches I find out nltk.grammar that actually solves the original problem

It's not a new library either, so I'm dumbfounded by why gpt4 couldn't make sense of my request.


I tried this to see. It does suggest using validators, though in a way that mostly solves the problem. The failure mode is when the string is valid complete or extended. It does have a bit of a workaround for that.

https://chat.openai.com/share/54cb8876-96ff-4434-a479-4d2dde...


using the exception only works at whole symbol level, i.e. with "cat do" as imput

No terminal matches 'd' in the current parser context, at line 1 col 5

cat do ^ Expected one of: * CAT * SPACE * FISH * DOG

Next valid symbols: ['* CAT\n\t* SPACE\n\t* FISH\n\t* DOG']


I'm not sure I understand, what that reports is enough to solve it right? Was the problem you couldn't get it to solve it from that point?


No because if I add space to the original string I don't get a valid string

Or given the context space is a valid production from the last error, but then I would need to roll back the incomplete "do" terminal and lose that context.


> No because if I add space to the original string I don't get a valid string

Yeah, it's telling you what it was expecting when it hit an invalid token.

You take the invalid part and filter the list of suggestions with it. If you tell chatgpt about the problem it solves it.

(Edit - when I was running this, having any number of spaces was parsed as valid btw if that's the concern - space was a valid option)


If it hasn't been used much/talked about much for this purpose then you wouldn't expect it to, right?


true, but then it's neither interpolating between the public documentation nor extrapolating from the possible similarities between concepts




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: