This is a good intro for anyone who already has at least some background in ML and wants to get up to speed on LLMs relatively quickly.
Props to the author for giving credit to Bandanau et al (2014), which I believe first proposed the concept of applying a Softmax function over token scores to compute attention, setting the stage for the original transformer by Vaswani et al (2017).
The big question here is what's next? So far my coding experiments with GPT-4 shows it's a shallow thinker. It has ready answers for many questions. One step further and it fails miserably. Much better but not a full replacement for google search.
These lecture slides from Yann LeCun from 1 month ago. Starting from slide 13, he presents his idea of a roadmap towards autonomous machine intelligence.
It will never completely replace Google search because no matter how large the model gets it will always have a training cut-off date in the past, and there will always be specific factual information that isn't a good fit for language models.
If you're looking for a LLM replacement for regular Google search then Bing and Bard are much more likely to fit the bill.
I think incremental training should be possible. When next model is much smaller and as an input takes latent state from the bigger model. Then training will be faster, those smaller models can be chained.
Another option is to have small re-trainable areas within the big model.
Of course ideally model should be able to work with some sort of "knowledge base", which can be updated daily. As a main, or additional data source.
As for me I'm looking for ways to increase productivity. Have ChatGPT Plus, and in waiting lists for GPT-4 and Bing. Neither solves big problems, but usually there are many small things which can be done faster. I would rather offload them.
It won’t replace google search because ultimately it doesn’t have a point of view, let alone a dedicated page with a variety of functions (although plugins eat somewhat at this). For example, pick a random article written for The Atlantic — we are getting human prose from a specific author with a specific writing style about a specific subject that fits the zeitgeist of the day with a specific length suitable for what’s being communicated. That is something that is both enjoyable and informative, with style. LLMs are great at summarizing and synthesizing, but it’s not the same thing.
But perhaps the LLM can eventually point me towards this to then learn more; somewhat what Bing is already doing but it’s more of a footnote than something upfront (e.g. “you might want to read an excellent write up in The Atlantic about decriminalizing drugs”).
It depends what you mean by combine. They work in fundamentally different ways. It's a bit like combining an internal combustion engine with a ramjet.
You might be able to compose systems that use LLMs for some sub-components and symbolic AI for other functions. So for example if ChatGPT gets asked a question that LLMs are poor at but symbolic systems are good at, it could switch to using a symbolic system, but that's not really combining the actual technologies.
I’ve had similar thoughts, wondering why LLMs aren’t just a component of a larger system, perhaps used to generate “thoughts” that are then tested by some other subsystems, which in turn may re-fire the LLMs to generate more.
I guess I’m imagining that testing the LLM output might be a problem that some other technique might be suited for, and smells like what we need to mitigate LLM weaknesses.
What I had in mind was to make the output of a natural language parser into symbolic semantic representations (the folks at CYC have such parsers) an additional part of the training data set of the LLM.
I've been doing research with NLP for about 5 years now I couldn't have created a better inrto list than this. Many of these resources were things I've sent people over the years.
So you think the logical patterns found in human language might also be similar enough to the logical patterns found in other systems that these LLMs have a jumpstart in figuring those out?
We already have such a poor idea of how these things seem to understand so much… it’ll be a wild day when cancer is cured and we have absolutely no idea why.
There is no capability in an LLM to do this and we don't know how to build an "AI" that is not an LLM.
(Just deciding that your LLM has magic powers because you've put it in a category called "AI" and you've decided that category has said magic powers is what that guy Wittgenstein was complaining about in philosophical problems. Besides, intelligence doesn't mean all your thoughts are automatically correct!)
You can't "cure cancer" because "cancer" is a label for different things with different causes. If your brain cells all have the wrong DNA there's no way to get different brain cells.
My experience with ChatGPT is that it cannot reason from the data it has, i.e., it cannot take abstract concepts it can write about and use them on something it doesn't have the solution for.
So not sure how the logical patterns really can evolve into something closer to AGI there. Maybe other LLMs? The inability to do math properly is really limiting, I think.
I haven't yet seen an example of an LLM failing at math that cannot be very easily solved by having the LLM use a calculator, much as all humans do with math of any significance. It needs to use a calculator more often, but that's a completely negligible compute cost.
I meant more abstract math, e.g., construct something that needs math to construct it. Take a concept like risk neutral pricing and get it to construct a replicating portfolio for something not totally trivial (i.e., with not a lot of solved examples on the web). Fails for me.
That's fair, I've seen that it can really struggle coming up with novel algorithms. I am curious on if there is more improvement on that front in future models, because even its current performance at algorithmic manipulation is far, far better than e.g. GPT-3.
Props to the author for giving credit to Bandanau et al (2014), which I believe first proposed the concept of applying a Softmax function over token scores to compute attention, setting the stage for the original transformer by Vaswani et al (2017).