Hacker News new | past | comments | ask | show | jobs | submit login

I have seen numerous posts of llm q&a and by the time people try to replicate them gpt4 is fixed. It either means that OpenAI is actively monitoring the Internet and fixes them or the Internet is actively conspiring to present falsified results for gpt4 to discredit OpenAI



> actively conspiring to present falsified results for gpt4 to discredit OpenAI

All this would be solved if OpenAI were a bit more open.


It would be nice if the organizations would publish a hash of the code and the trained dataset.


You aren't able to get access to the 'Open'AI dataset though, are you? Agreed, it would be an excellent addition for comparing source-available models, but that doesn't help with the accusations of OpenAI's foul play nor of the existence of an anti-OpenAI conspiracy.


GPT-4 (at least) is explicit in saying that it's learning from user's assessments of its answers, so yes, the only valid way to test is to give it a variation of the prompt and see how well that does. GPT-4 failed the "Sally" test for the first time after 8 tries when I changed every parameter. It got it right on the next try.


It’s important to remember that GPT4 is only deterministic at the batch level because it is a mixture of experts model. Basically every time you invoke it, your query could get routed to a different expert because of what else is in the batch. At least this is my understanding based on others analysis.


> because it is a mixture of experts model

Do you have a source for this? I also considered but never saw any evidence that this is how GPT 4 is implemented.

I've always wondered how a system of multiple specialized small LLMs (with a "router LLM" in front of all) would fare against GPT4. Do you know if anyone is working on such a project?


Or people post outliers because they're more interesting.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: