That's like teaching a baby to speak by recording it for a month while not reacting to it, and then forming a committee to analyze the recordings and conduct a high-intensity training session with the baby.
Actually if ChatGPT gives you a bad/wrong answer, and you reply telling it is wrong and why it is wrong, it will answer with something that you think is correct.
So you are already doing a kind of real time RLHF in the chat.
That is how DAN was prompted to exist.
Well, correct me if I'm wrong, but if I do what you describe, all the learning will be gone when I close the tab, or even when I chat to it some more, so that my correction falls out of its context window.
That just means that it doesn't have a long-term memory. This is not an intrinsic limitation - you can give those things access to some larger storage of information and tell them how they can query it, and they will (try to) do that. It's exactly how Bing AI does relevant searches, for example.