Oh wow I only just uploaded this and it's on HN already!
I'm actually quite excited about this video because I tried my hardest to pack all the key info I could think of into a 90 minute talk -- the goal is to be the one place I point coders at when they ask "hey tell me everything I need to know about LLMs".
Having said that, I'm sure I missed things or there are bits that are unclear -- this is my first attempt at doing this, and I plan to expand this out into a full course at some point. So please tell me any questions you still have after watching the video, or let me know of any concepts you think I should have covered but didn't.
I'm actually heading to bed shortly (it's getting late here in Australia!) so not sure I'll be able to answer many questions until morning, sorry. But I'll definitely take a look at this page when I get up. I'll also add links to relevant papers and stuff in the YouTube description tomorrow.
(Oh I should mention -- I didn't cover any ethical or policy issues; not because they're not important, but because I decided to focus entirely on technical issues for this talk.)
I thought the selection of projects was great - some OpenAI API hacking including a Code Interpreter imitation created using OpenAI functions, then some Hugging Face model local LLM execution, and then a fine-tuning example to build a text-to-SQL model somehow crammed into just 10 minutes at the end!
Thanks for the inspiring video!
Can I have some questions / play devil's advocate a bit?
I hope you don't mind if I start with constructive criticism: It seems you talk about capabilities of the LLM based _applications_ (what ChatGPT, OpenAI API, Bard can or can't do) and not about the capabilities of the large language models itself in the first half of the video. I would have loved to see meaningful comparisons with opensource models (e.g. Llama2) way earlier, not just the last third.
In the part "What GPT-4 can do?" (around 17:20 in the video) you show that it answers correctly to the questions in the study (which claimed GPT4 can't answer correctly).
Are you sure that ChatGPT does this with the model or by clever tricks?
I mean, if I were OpenAI and find this criticism study about GPT-4, I would of course employ a small team to fix this. But I guess the fix is not retraining / finetuning the model, just maybe adding a wrapper to identify logical / puzzle questions and add a guided prompt system to get correct results. With other words: "hardcoding" the solution (path) to several types of questions in an application layer (not the model itself).
Of course it's difficult to test this theory, because one need to invent new kinds of logical puzzles to test GPT4, because I assume OpenAI has the resources to create hardcoded solutions for all the existing types.
Abstracting the question a bit:
Should we explain GPT4's (and similar systems') success (or failure) _only with the LLM_ or also consider a huge expert system around it? ( https://en.wikipedia.org/wiki/Expert_system )
Should we care? (I personally don't. I think it's the utility that matters.)
How much "quality improvement" will be based on LLM training vs. better expert systems around the LLMs in the future?
----
I like the part where ChatGPT repeats the error to the modified wolf, goat, cabbage problem (around 29:30 in the video) and you say "once GPT4 starts to be wrong, it tends to be more and more wrong". I guess the reason is partly the "usual popular solution" has high probability, partly because the previous question-answer pairs are feeded into the current generation, so it reinforces the wrong solution. Your clever prompt fixes this, which validates the idea that "prompt engineering" is a relevant skill.
Can we assume that LLMs will be able to "work" alone in the future or will they always need guidance from humans to reach human level reasoning?
----
The coding example shows that while the model is generating the code, but the runtime/testing is handled by a wrapper / agent-like workflow. The OCR and charting examples are also likely prebuilt workflows (also in Bard). But it's still great, that it's working.
----
In the part "What can you do on your own computer?" (around 53:30) you say that "you're gonna need to use a GPU" and also the ending implies you must use GPU. I get OK results and 4+ tokens/s on my cheap laptop with Llama.cpp. It's not great; around half the speed of a Mac; but on a machine that is way cheaper than half of a Mac. Some comments on HN mentioned 12-15 tokens/s with 2 x 3090. For the same price one could buy many GPU-less machines with a combined speed that might be greater than this. I'm not saying that it's practical, but it's good to know that GPU-less solutions are not orders of magnitude worse, especially not in price/performance.
Thanks again and I hope you make more videos like this!
Thanks a lot for this video, best LLM usage tutorial I've seen so far.
At https://youtu.be/jkrNMKz9pWU?si=Dvz-Hs4InJXNozhi&t=3278 when talking about valid use cases for a local model vs GPT4 is: "You might want to create your own model that's particularly good at solving the kinds of problems that you need to solve using fine tuning, and these are all things that you absolutely can get better than GPT4 performance".
In regards to this, there's an idea I've been thinking about for some time: Imagine a chatbot that is backed by multiple "small" models (such as 7B parameters), where each model is fine tuned for a specific task. Could such a system outperform GPT4?
Here's a high level overview how I imagine this to work:
- Context/prompt is sent to a "router model", which is trained to determine what kind of expert model can best answer/complete the prompt.
- The system then passes the context/prompt to the expert model and returns that answer.
- If no expert model is found, just use a generic instruct tuned general purpose LLM to answer
If you can theoretically get better than GPT4 performance on a small models fine tuned for that task, maybe a cluster of such small models could collectively outperform GPT4.
Sambanova just launched something similar to what you're describing. It's a demo of their new chip running a 1T param MoE model 150 7B llama2s, each retrained to be an expert in a different topic. So one of them is a "law" expert, another on "physics", etc.
They've got a video here [1] (scroll down slightly) that compares it against a 180B Falcon model that's running on GPUs on HuggingFace. The MoE results are not only just as good quality-wise, but also ridiculously fast. Like, nearly instant. A big benefit is that the experts can be swapped-out and retrained with new data, which is obviously not as easy with the more monolithic 180B model.
It makes a lot of sense! In fact there's a number of open source projects working on just such a model right now. Here's a great example: https://github.com/XueFuzhao/OpenMoE/
Excellent video. I shared it in my workplace. Probably the most comprehensive introduction to the topic from a practical standpoint that I'm aware of. In particular, I loved the "those viral articles about GPT can't do X don't reproduce" section. Hoping it helps folks I know think about how to think critically when considering the tech.
Excellent video! Learnt a few new tricks that I'll use in future.
I find just by trying something I discover a new use.
A good example the other day was I needed to convert a spreadsheet of addresses into GeoJSON to use as a map layer. Being in a particularly lazy mood I decided to see how well ChatGPT would handle it.
As a first step I gave it one pair of lat/long and asked it to convert the deg/min to decimal. No problem, showed all the workings.
I then gave it all the whole lat/long column and said not to show workings and it output that fine.
I then created a sample JSON structure with placeholders and said I will provide a data set to populate the structure and to use the column names for replacing the placeholders.
Dropped in the data and it generates the JSON perfectly.
What was interesting is that it redid the lat/long conversion and also incremented an id property I didn't mention without prompting. Was quite impressed with that.
What a gem! I've been waiting so long for a LLM course by Jeremy. Being one of the ones that help start all of this with his ULMFiT, his takes and tips are as good as I expected. Looking forward for more detailed and bottom level courses when the open source catches up with the proprietary world.
Not a lot of love given to RAG method considering I think for most applications a fine-tuned model in the truest sense won't be the best and most efficient solution to their problem.
As the original author, can you rate this ai generated summary of your video:
Video tutorial on language models by Jeremy Howard from fast.ai. In the tutorial, Howard explains the basics of language models and how to use them in practice. He starts by defining a language model as something that can predict the next word of a sentence or fill in missing words. He demonstrates this using an open AI language model called text DaVinci 003.
Howard explains that language models work by predicting the probability of various possible next words based on the given context. He shows how to use language models for creative brainstorming and playing with different word predictions.
He then discusses language model training and fine-tuning processes, using the ULMfit approach as an example. He explains the three steps of language model training: pre-training, language model fine-tuning, and classifier fine-tuning. He mentions the importance of fine-tuning language models for specific tasks to make them more useful.
Howard also demonstrates how to use the open AI API to access language models programmatically. He shows examples of using the API to generate text, ask questions, perform code interpretation, and even extract text from images using OCR.
Additionally, he discusses the options for running language models on your own computer, such as using GPUs, renting GPU servers, or utilizing cloud platforms like Kaggle and Colab.
He mentions the Transformers library from Hugging Face, which provides pre-trained models and data sets for language processing tasks. He highlights the benefits of fine-tuning models and using retrieval augmented generation to combine document retrieval with language generation.
The tutorial concludes with a discussion on other options for running language models, including using private GPT models, Mac-based solutions like H2O GPT and lima.cpp, and the possibility of fine-tuning models with custom data sets.
Overall, the tutorial provides a comprehensive overview of language models, their applications, and different ways to use them, both with open AI models and on your own computer.
I think that summary is pretty good, although it doesn't really highlight the most interesting bits, such as the fact that we implement a code interpreter from scratch, and that we cover fine-tuning to create a model that successfully converts prose questions into SQL queries.
The most exciting thing about LLMs is how they become easier for intermediate programmers every day. It really makes your imagination run wild when you can grasp the concepts.
This is what excites me the most. It’s such a simple interface (prompting) with unlimited capabilities once you look at it as a logic and pattern engine rather than magic
I'm actually quite excited about this video because I tried my hardest to pack all the key info I could think of into a 90 minute talk -- the goal is to be the one place I point coders at when they ask "hey tell me everything I need to know about LLMs".
Having said that, I'm sure I missed things or there are bits that are unclear -- this is my first attempt at doing this, and I plan to expand this out into a full course at some point. So please tell me any questions you still have after watching the video, or let me know of any concepts you think I should have covered but didn't.
I'm actually heading to bed shortly (it's getting late here in Australia!) so not sure I'll be able to answer many questions until morning, sorry. But I'll definitely take a look at this page when I get up. I'll also add links to relevant papers and stuff in the YouTube description tomorrow.
(Oh I should mention -- I didn't cover any ethical or policy issues; not because they're not important, but because I decided to focus entirely on technical issues for this talk.)