Andrej's video is great but the explanation on the RL part is a bit vague to me. How exactly do we train on the right answers? Do we collect the reasoning traces and train on them like supervised learning or do we compute some scores and use them as a loss function ? Isn't the reward then very sparse? What if LLMs can't generate any right answers cause the problems are too hard?
Also how can the training of LLMs be parallelized when updating parameters are sequential? Sure we can train on several samples simultaneously, but the parameter updates are with respect to the first step.
As I understood that part, in RL for LLMs you take questions for which the model already sometimes emits correct answers, and then repeatedly infer while reinforcing the activations the model made during correct responses, which lets it evolve its own ways of more reliably reaching the right answer.
(Hence the analogy to training AlphaGo, wherein you take a model that sometimes wins games, and then play a bunch of games while reinforcing the cases where it won, so that it evolves its own ways of winning more often.)
AlphaGo seems more like an automated process to me because you can start from nothing except the algorithm and the rules. Since a Go game only has 2 outcomes most of the time, and the model can play with itself, it is guaranteed to learn something during self-play.
In the LLM case you have to have an already capable model to do RL. Also I feel like the problem selection part is important to make sure it's not too hard. So there's still much labor involved.
Yes, IIUC those points are correct - you need relatively capable models, and well-crafted questions. The comparison with AlphaGo is that the processes are analogous, not identical - the key point being that in both cases the model is choosing its own path towards a goal, not just imitating the path that a human labeler took.
In my experience Linux can have some driver bugs on specific hardware that windows doesn't, like not waking up after suspend on some Nvidia cards with some drivers, etc. But it handles hardware issues miles better.
90% of hard drives that windows does not detect Linux can detect and copy 99% of the data with some IO errors for the rest. Can handle hardware instability like bad rams or too high of an overclock for ages while windows crashes very easily.
As consumer-hostile as Microsoft's practices are, and that these things needs to be optional, I find linking your Windows to a Microsoft's account still pretty useful. With it you can sync your Windows settings across different devices. You can use Find My Device. You can save your BitLocker's encryption key to the cloud. It helps when you forgot your login password. Copilot AI stuffs,... Turning off telemetry and deinstalling some bloatware requires clicking through a few menus, but it's still a mostly just-work experience compared to the amount of time I need to set up a Linux box.
On Linux, configuring the fingerprint to work everywhere requires reading through different online threads and edit a config file. Enabling TPM auto disk decryption involves 10 steps command line. Enabling proprietary video codecs necessitates adding a third party repository. And many other small issues that I have to troubleshoot (shutdown freezing, sddm crashing after wake-up from sleep, PackageKit unable to update package, having to turn on a kernel flag for touchpad to work...). I really want Linux to work for me but still have to decide to use Windows on my personal laptop, coming from someone who has to work with Linux everyday at work.
Only if all you ever do is web browsing and do some light office work. As soon as you have to troubleshoot something or install some less popular software, you can't avoid it.
I'm not an advocate for Linux use for other people. (Like I was about 20 years ago.) If they insist on Windows or a Mac, that's not my concern at all.
I often liken it to choice of brand of motor car I drive. I prefer to drive a Mercedes and have done for about 20 years. If other people want to drive a BMW or a Ford or a Chev, that's no concern of mine either.
> At least on OKCupid, women rate 80% of men as below-average attractiveness, while men rate women at right about 50% as below-average and 50% as above-average
is very different from
> Women view about 80% of men as unattractive. It is not a normal distribution, as men rate women's attractiveness.
Pretty confounding to take that sort of a logical leap in a thread that's about the dark patterns, gamification and the highly modified context into which dating apps transform dating inside them.
I think a lot has happened between 1960 and 2020:
smartphones, internet, machine learning, hubble telescope, internet of things, ecommerce ...
Going from personal computer of the 1960s to what we have today in our pocket is an enormous achievement in itself so it's a bit disingenuous to list it off as a single word "personal computer".
If we measure progress by whatever metrics: number of patents, number of scientific publication, ... in fact things are speeding up even more.
Most of those things are just natural evolutions of existing technologies. None of those things (barring the internet) are things which scifi from the 1950's didn't predict.
My point is that we don't have those mind blowing inventions that were hard to predict or see coming. Smartphones are the natural evolution of computer chips. Machine learning was already an active discipline in the 60's. In the 60s they were already seeing computing power increase at very fast rates. Internet things is, again, just an extension of general communication. And ecommerce is a real natural extension of the internet.
Compare that to a refrigerator. Before the refrigerator there was an entire industry of ice distribution. It was not a natural extension of really anything.
Same for the microwave. You MIGHT be able to argue that it was somewhat the natural extension of radio, just finally recognized but that's a stretch to think anyone would leap from radio waves to microwave ovens.
As for patent numbers and scientific publications, that doesn't really seem to indicate actual innovation. Rather, it points to population growth and more participation in those programs. Patents, in particular, are issued more out of defense than actual innovation.
What we aren't seeing is massively disruptive technologies. We've not see the power generation which makes fossil fuels obsolete. We aren't seeing the breakthrough in rockets that allow for space vacations. We aren't seeing flying cars. Instead, we are seeing cars getting gradually more efficient, batteries getting gradually better, and computing power getting gradually better.
Actual, honest to goodness, life changing innovation has just stopped.
That's not really all bad, but it does point to "we probably won't be like the gods" any time soon. Just like life got better in the new world from the 1600s to the 1800s (for the most part), I expect life to get better from here on out, but not so mindbogglingly better that 100 years from now it will be completely different from what we experience today. The only major difference in life in 100 years is going to the the impacts of climate change.
Also how can the training of LLMs be parallelized when updating parameters are sequential? Sure we can train on several samples simultaneously, but the parameter updates are with respect to the first step.
reply