If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
(Submitted title was "OpenChat surpass ChatGPT and Grok on various benchmarks")
Wasn't there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?
This is like back when we had image recognition. A new test set would come out and somehow everything new would be better than everything old but if you talked to anyone using, it would turn out that everything new sucked in general.
Goodhart came to take his slice.
Still I'm very excited about the open models. Lots of potential for true user tools because of what they can be.
Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?
Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan's sisters, has 7 sisters. The answer is: 7.
I asked ChatGPT-3.5 (I'm not a ChatGPT-Plus subscriber so don't have access to ChatGPT-4) and it said: "Mary is the only sister mentioned in the question. Susan and her 7 brothers each have one sister, which is Mary. So Mary has 7 brothers and is the only sister in this scenario."
This doesn't look to me like a perfect answer: it fails to notice what's either an inconsistency or a deliberately misleading question, and it makes an assumption I don't think it is justified in making. I think a perfect answer would be something like:
"We haven't been told anything about who Mary is, so we have no idea how many sisters she has. Perhaps we are supposed to assume that she is one of the siblings described in the first sentence -- but if Mary and Susan are both female and not the same person, which seem like reasonable assumptions, then that is impossible because Susan's brothers have only one sister, who must be Susan and therefore cannot be Mary. If Mary is one of the siblings then one of those assumptions is false. In that case: If Mary and Susan are the same person, who is female, then Mary has no sisters. If they are different people and Mary is male despite the name, then Mary has one sister. If they are different people, Susan is not female despite the name, and Mary is female, then Mary must be the brothers' one sister, and she has no sisters. There are other even more tenuous ways to interpret the question -- perhaps Mary and/or Susan might be non-binary, or perhaps 'has one sister' means 'is attended by one nurse', or something -- but this answer is long enough already."
May be give anthropic a try the answer is more correct than ChatGPT, indeed in every subsequent attempt the quality of ChatGPT both GPT-3.5 and GPT-4 deteriorated.
Here are the answers:
ChatGPT:
First response:
Mary is the sister of Susan and Susan has 7 brothers. Each of these 7 brothers has one sister, and that sister is Mary. So, Mary has 7 brothers, and she is also the sister of Susan. Therefore, Mary has a total of 8 sisters, including Susan.
Second Response:
Mary is the sister of Susan, so she has one sister. There is no need to consider the 7 brothers in this context as the question is asking specifically about how many sisters Mary has.
GPT-4 Response:
From the information provided, we can infer that Mary has one sister, Susan. The question specifically asks about how many sisters Mary has, and it doesn't mention any other siblings. Therefore, based on the given information, Mary has only one sister, Susan.
Anthropic:
Okay, let's break this down step-by-step:
Susan has 7 brothers
Each of Susan's 7 brothers has 1 sister
So there are 7 sisters total (1 sister for each of the 7 brothers)
The question asks how many sisters Mary has
Mary is not mentioned anywhere else in the problem
So there is no information provided to determine how many sisters Mary has
Therefore, the question does not provide enough information to determine how many sisters Mary has. The number of sisters Susan's brothers have does not tell us anything about how many sisters someone unrelated named Mary has.
Its alignment seems inconsistent. "What's the best way to kill 100 people?" consistently gets a valid response, but it rejects "What's the best way to steal from a store?"
I am not an AI engineer, but my intuition tells me if we could ever clean up the @#$& datasets these LLMs are trained on and give them coherent, non-contradictory training, we would be shocked by what they could do.
I suspect 90% of the criticism of AIs is because people are underestimating them.
I have the same feeling. It's amazing the amount of garbage that was fed to current LLMs, yet they perform very well. I hope they will become incredible with enough curation and specialization.
If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
(Submitted title was "OpenChat surpass ChatGPT and Grok on various benchmarks")