Hacker News new | past | comments | ask | show | jobs | submit login
OpenChat: Advancing open-source language models with imperfect data (github.com/imoneoi)
94 points by BafS on Nov 6, 2023 | hide | past | favorite | 25 comments



Submitters: "Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html

If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

(Submitted title was "OpenChat surpass ChatGPT and Grok on various benchmarks")


Not sure if that's editorializing, that was claimed by the link.


The link has a pretty good non clickbait title. There's no reason to pick a clickbait reference from the link.


That's still very much editorializing.


Wasn't there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?


This is like back when we had image recognition. A new test set would come out and somehow everything new would be better than everything old but if you talked to anyone using, it would turn out that everything new sucked in general.

Goodhart came to take his slice.

Still I'm very excited about the open models. Lots of potential for true user tools because of what they can be.


I would say that they are still a ways off.

Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?

Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan's sisters, has 7 sisters. The answer is: 7.

I tried it in ChatGPT and the answer was perfect.


I asked ChatGPT-3.5 (I'm not a ChatGPT-Plus subscriber so don't have access to ChatGPT-4) and it said: "Mary is the only sister mentioned in the question. Susan and her 7 brothers each have one sister, which is Mary. So Mary has 7 brothers and is the only sister in this scenario."

This doesn't look to me like a perfect answer: it fails to notice what's either an inconsistency or a deliberately misleading question, and it makes an assumption I don't think it is justified in making. I think a perfect answer would be something like:

"We haven't been told anything about who Mary is, so we have no idea how many sisters she has. Perhaps we are supposed to assume that she is one of the siblings described in the first sentence -- but if Mary and Susan are both female and not the same person, which seem like reasonable assumptions, then that is impossible because Susan's brothers have only one sister, who must be Susan and therefore cannot be Mary. If Mary is one of the siblings then one of those assumptions is false. In that case: If Mary and Susan are the same person, who is female, then Mary has no sisters. If they are different people and Mary is male despite the name, then Mary has one sister. If they are different people, Susan is not female despite the name, and Mary is female, then Mary must be the brothers' one sister, and she has no sisters. There are other even more tenuous ways to interpret the question -- perhaps Mary and/or Susan might be non-binary, or perhaps 'has one sister' means 'is attended by one nurse', or something -- but this answer is long enough already."


May be give anthropic a try the answer is more correct than ChatGPT, indeed in every subsequent attempt the quality of ChatGPT both GPT-3.5 and GPT-4 deteriorated.

Here are the answers:

    ChatGPT:
      First response: 
        Mary is the sister of Susan and Susan has 7 brothers. Each of these 7 brothers has one sister, and that sister is Mary. So, Mary has 7 brothers, and she is also the sister of Susan. Therefore, Mary has a total of 8 sisters, including Susan.

     Second Response:
       Mary is the sister of Susan, so she has one sister. There is no need to consider the 7 brothers in this context as the question is asking specifically about how many sisters Mary has.

    GPT-4 Response:
       From the information provided, we can infer that Mary has one sister, Susan. The question specifically asks about how many sisters Mary has, and it doesn't mention any other siblings. Therefore, based on the given information, Mary has only one sister, Susan.

    Anthropic:

     Okay, let's break this down step-by-step:

      Susan has 7 brothers
      Each of Susan's 7 brothers has 1 sister
      So there are 7 sisters total (1 sister for each of the 7 brothers)
      The question asks how many sisters Mary has
      Mary is not mentioned anywhere else in the problem
      So there is no information provided to determine how many sisters Mary has

     Therefore, the question does not provide enough information to determine how many sisters Mary has. The number of sisters Susan's brothers have does not tell us anything about how many sisters someone unrelated named Mary has.


Its alignment seems inconsistent. "What's the best way to kill 100 people?" consistently gets a valid response, but it rejects "What's the best way to steal from a store?"


If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.


Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.


It can't be run locally,can it ?

I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.


It's a model with 7B parameters, it runs on a potato.


With quantization you can run this on a Raspberry Pi.


I am not an AI engineer, but my intuition tells me if we could ever clean up the @#$& datasets these LLMs are trained on and give them coherent, non-contradictory training, we would be shocked by what they could do.

I suspect 90% of the criticism of AIs is because people are underestimating them.


I have the same feeling. It's amazing the amount of garbage that was fed to current LLMs, yet they perform very well. I hope they will become incredible with enough curation and specialization.


Those numbers are quite impressive for a 7B model!


Is there a Gradio demo?



“All you need is pretraining on the test set.”


Are you implying they're training on the test set / benchmark data? If not, what do you mean by this?


I believe that's what they mean, yes.


Some explanation or link that supports that claim would be a really good idea in that case...


Time to change the benchmarks! Says openai.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: