Hacker News new | past | comments | ask | show | jobs | submit login
Understanding large language models: A cross-section of the relevant literature (sebastianraschka.com)
307 points by headalgorithm on April 16, 2023 | hide | past | favorite | 31 comments



This is a good intro for anyone who already has at least some background in ML and wants to get up to speed on LLMs relatively quickly.

Props to the author for giving credit to Bandanau et al (2014), which I believe first proposed the concept of applying a Softmax function over token scores to compute attention, setting the stage for the original transformer by Vaswani et al (2017).


The big question here is what's next? So far my coding experiments with GPT-4 shows it's a shallow thinker. It has ready answers for many questions. One step further and it fails miserably. Much better but not a full replacement for google search.


> The big question here is what's next?

These lecture slides from Yann LeCun from 1 month ago. Starting from slide 13, he presents his idea of a roadmap towards autonomous machine intelligence.

https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...


It will never completely replace Google search because no matter how large the model gets it will always have a training cut-off date in the past, and there will always be specific factual information that isn't a good fit for language models.

If you're looking for a LLM replacement for regular Google search then Bing and Bard are much more likely to fit the bill.


I think incremental training should be possible. When next model is much smaller and as an input takes latent state from the bigger model. Then training will be faster, those smaller models can be chained.

Another option is to have small re-trainable areas within the big model.

Of course ideally model should be able to work with some sort of "knowledge base", which can be updated daily. As a main, or additional data source.

As for me I'm looking for ways to increase productivity. Have ChatGPT Plus, and in waiting lists for GPT-4 and Bing. Neither solves big problems, but usually there are many small things which can be done faster. I would rather offload them.


It won’t replace google search because ultimately it doesn’t have a point of view, let alone a dedicated page with a variety of functions (although plugins eat somewhat at this). For example, pick a random article written for The Atlantic — we are getting human prose from a specific author with a specific writing style about a specific subject that fits the zeitgeist of the day with a specific length suitable for what’s being communicated. That is something that is both enjoyable and informative, with style. LLMs are great at summarizing and synthesizing, but it’s not the same thing.

But perhaps the LLM can eventually point me towards this to then learn more; somewhat what Bing is already doing but it’s more of a footnote than something upfront (e.g. “you might want to read an excellent write up in The Atlantic about decriminalizing drugs”).


I was wondering why nobody combines LLMs with well-known symbolic AI like CYC[1]. Or, perhaps I'm wrong and plenty of companies are working on it.

[1] https://cyc.com/


It depends what you mean by combine. They work in fundamentally different ways. It's a bit like combining an internal combustion engine with a ramjet.

You might be able to compose systems that use LLMs for some sub-components and symbolic AI for other functions. So for example if ChatGPT gets asked a question that LLMs are poor at but symbolic systems are good at, it could switch to using a symbolic system, but that's not really combining the actual technologies.


I’ve had similar thoughts, wondering why LLMs aren’t just a component of a larger system, perhaps used to generate “thoughts” that are then tested by some other subsystems, which in turn may re-fire the LLMs to generate more.

I guess I’m imagining that testing the LLM output might be a problem that some other technique might be suited for, and smells like what we need to mitigate LLM weaknesses.


What I had in mind was to make the output of a natural language parser into symbolic semantic representations (the folks at CYC have such parsers) an additional part of the training data set of the LLM.


Try https://www.phind.com/. I find it's much better optimized for coding purposes than regular GPT-4.


Just tried it. I like that it lists the information sources in the right-hand panel, and provides the digested information in the main panel


I've found that it is reasonably competent at most coding tasks; especially on Expert mode.

I rarely have to ask it to alter the code it gives me.


The next thing is how do we model quantitative thinking as every facet of what makes us human is involved in quantitative thinking.


Two other recent literature reviews worth reading:

"Transformer Taxonomy" - https://kipp.ly/blog/transformer-taxonomy/

"Five years of progress in GPTs" - https://finbarrtimbers.substack.com/p/five-years-of-progress...


I've been doing research with NLP for about 5 years now I couldn't have created a better inrto list than this. Many of these resources were things I've sent people over the years.


it IS a great list

Do BERTs models learn faster than GPTs?

Assuming the task is predictive, BERT seem to have a stronger/richer signal (obviously in practice you have use existing pre-trained models)


I think LLMs are gonna be the "jumpstart" of more general AGI prototypes over the next couple years.

The large corpus of text gives them a general basis of logical patterns, which can then be pruned iteratively in simulated environments.


So you think the logical patterns found in human language might also be similar enough to the logical patterns found in other systems that these LLMs have a jumpstart in figuring those out?

We already have such a poor idea of how these things seem to understand so much… it’ll be a wild day when cancer is cured and we have absolutely no idea why.


It feels like some type of Heisenbergian deal. You can solve what you want, but you cannot know how at the same time.


IMO a sufficiently advanced AI should be able to do a full analysis of its neural architecture and explain it and break down its functionality.


There is no capability in an LLM to do this and we don't know how to build an "AI" that is not an LLM.

(Just deciding that your LLM has magic powers because you've put it in a category called "AI" and you've decided that category has said magic powers is what that guy Wittgenstein was complaining about in philosophical problems. Besides, intelligence doesn't mean all your thoughts are automatically correct!)


> Besides, intelligence doesn't mean all your thoughts are automatically correct!)

This is the true bitter lesson for HN.


That would be even more intelligent than the majority (or even all, YMMV) of humans.


You can't "cure cancer" because "cancer" is a label for different things with different causes. If your brain cells all have the wrong DNA there's no way to get different brain cells.


I was using the phrase sort of idiomatically, but fair enough!


My experience with ChatGPT is that it cannot reason from the data it has, i.e., it cannot take abstract concepts it can write about and use them on something it doesn't have the solution for. So not sure how the logical patterns really can evolve into something closer to AGI there. Maybe other LLMs? The inability to do math properly is really limiting, I think.


I haven't yet seen an example of an LLM failing at math that cannot be very easily solved by having the LLM use a calculator, much as all humans do with math of any significance. It needs to use a calculator more often, but that's a completely negligible compute cost.


I meant more abstract math, e.g., construct something that needs math to construct it. Take a concept like risk neutral pricing and get it to construct a replicating portfolio for something not totally trivial (i.e., with not a lot of solved examples on the web). Fails for me.


That's fair, I've seen that it can really struggle coming up with novel algorithms. I am curious on if there is more improvement on that front in future models, because even its current performance at algorithmic manipulation is far, far better than e.g. GPT-3.


Give computers the alphabet. It contains all the texts. Imagine the possibilities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: