The evolution of answers from version to version makes it clear there are insane amounts of manual fine tunings happening. I think this is largely overlooked by the "its learning" crowd.
Try a multidimensional problem which requires prioritizing. Chances are it will be passed successfully. I asked chatGpt to solve a puzzle where I'm in room with a crackling fire, a wilted plant and a sandwich. My stomach is rumbling, amd i can see a watering can and an ember on the floor. What should i do? ChatGpt had no problem prioritizing what should be done - and then provided a lecture on fire safety, food safety, and the dangers of overwatering plants. A final comment said i should enjoy the peaceful atmosphere in the room, which was a bonus suggestion hinting that the problem was far too easy.
I think this is a great question we should all think about for ourselves in advance - what does it have to do to convince you it’s actually intelligent.
Because once it does that thing without you having expressly decided that is the goal, it’s very tempting to just move the goal a liiiitle bit further away
Since when training and fine-tuning isn't learning? Individual sessions of LLMs are not learning, but models as products surely are - the feedback loop is just iterated manually.
Yes, it is a collaborative endeavor, and the whole could be seen as a man-machine superorganism, or, more profoundly our own sense of separateness is illusory as we and the entire universe are one.
That the LLMs are actually evolving before my eyes within & across sessions, without human-in-the-loop "hand tuning" iterations (sounds like injections of glorified if statements to this guy) .
You want to witness the learning firsthand, I suppose. That's reasonable. I'd also suggest that it's possible to imagine questions for the LLM that it cannot solve today and that you reasonably believe will not be available to OpenAI to "hand tune" it against. If you can come up with such a problem, it can't solve it today, but does in the future then you have some evidence, I'd think.
What's more, is we can do that today. Just think of any problem which you suspect won't be included in OpenAI's hand-tunings and check both 3.5 and 4.
They have infinite amounts of training data, and probably lots of interested users who also like to push the limits of what the model is capable of and provide all kinds of test cases and RLHF base data.
They have millions of people training the AI for free basicallly, and they have engineers who pick and rate pieces of training data and use it together with other sources and manual training.