As someone who missed the boat on this, is learning about this just for historical purposes now, or is there still relevance to future employment? I just imagine the OpenAI eats everyone's lunch in regards to anything AI related, am I way off base?
The most important thing to learn for most practical purposes is what the thing can actually do. There's a lot of fuzzy thinking around ML - "throw AI at it and it'll magically get better!" Sources like Karpathy's recent video on what LLMs actually do are good anti-hype for the lay audience, but getting good practical working knowledge that's a level deeper is tough without working through it. You don't have to memorize all the math, but it's good to get a feel for the "interface" of the components. What is it that each model technique actually does - especially at inference time, where it needs to be well-integrated with the rest of the stack?
In terms of continued relevance - "deep learning", meaning, dense neural nets trained to optimize a particular function, haven't fundamentally changed in practice in ~15 years (and much longer than that in theory), and are still way more important and broadly used than the OpenAI stuff for most purposes. Anything that involves numerical estimation (e.g., ad optimization, financial modeling) is not going to use LLMs, it's going to use a purpose-built model as part of a larger system. The interface of "put numbers in, get number[s] out" is more explainable, easier to integrate with the rest of your software stack, and more measurable. It has error bars that are understandable and occasionally even consistent. It has a controllable interface that won't suddenly decide to blurt corporate secrets or forget how to serialize JSON. And it has much, much lower latency and cost - any time you're trying to render a web page in under 100ms or run an optimization over millions of options, generative AI just isn't a practical option (and is unlikely to become one, IMO).
I don't have a significant math or theoretical ML background, but I've spent most of the last 10 years working side by side with ML experts on infra, data pipelines, and monitoring. I'm not sure I could integrate the sigmoid off the top of my head, but that's not what's important - I've done it once, enough to have some idea how the function behaves, and I know how to reason about it as a black box component.
Terrific explanation, and it matches my experience running a data science team. I encourage my team to start with the simplest possible approach to every problem, which requires understanding how different algorithms work. Does this project require a t-test, XGBoost, a convolutional neural network, something else? What if we recode the dependent variable from numeric to binary?
Yep, that's the one I meant - sorry, should have linked.
His series on making a GPT from scratch is also great for building intuition specifically about text-based generative AI, with an audience of software developers.
This is about deep learning, of which LLMs are a subset. If you are interested in machine learning, then you should learn deep learning. It is incredibly useful for a lot of reasons.
Unlike other areas of ML, the nature of deep learning is such that its parts are interoperable. You could use a transformer with a CNN if you wish. Also, deep learning enables you to do machine learning on any type of data, text, images, video, audio. Finally, it can naturally scale computationally.
As someone pretty involved in the field, I lament that LLMs are turning people away from ML and deep learning, and following the misconceptions that there’s no reason to do it anymore. Large algorithms are expensive to run, have slow throughput and are still generally poorer performing than purpose built models. They’re not even that easy to use for a lot of tasks, in comparison to encoder networks.
I’m biased, but I think it’s one of the most fun things to learn in computing. And if you have a good idea, you can still build state of the art things with a regular gpu at your house. You just have to find a niche that isn’t getting the attention that LLMs are ;)
I started off being really excited to learn, but as time went on I actually lost interest in the field.
The whole thing is essentially curve fitting. The ML field is essentially an art more than a science and it's all about tricks and intuitions on different ways of getting that best fit curve.
From this angle the whole field got way less interesting. The field has nothing deeper or more insightful to offer beyond this concept of curve fitting.
I've found this fun way to think of it: the goal is to invent a faster form of evolution for pattern recognition, learning, and autonomous task completion. I think one needs to consider it more like biology and a science than pure logic and math. We can discover things that work, and then after that we can study them to learn why they work, just like we don't fully understand the brain yet.
I think there are some really cool problems, such as:
1. Is synthetic data viable for training?
2. How do you make deep learning agents that can do task planning and introspection in complex environments?
3. How do we efficiently build memory and data lookup into AI agents? And is this better/worse than making longer context windows?
Although it fundamentally is curve fitting, I'd venture to say that at some point, having to handle millions of parameters makes the curve fitting problem unrecognizable... A change in quantity is a change is nature if you will.
IOW: to me, fitting a generalized linear model is very different than fitting a convolutional network.
You could argue all the building blocks are forms of curve fits, but that isn't a terribly useful statement even if true. If you can fit a curve to the desired behavior of any function, or composition of functions (which is a function) then you can solve any problem you can express the desired behavior of. Including the expressing of desired behavior for some other class if problems. Saying it is just curve fitting is like saying something is just math. The entirety of reality is just math.
By that logic, anything that is predictive is curve fitting, including entire academic fields like physics and climatology. You could say that all automation is curve fitting. I don’t think there’s much to be gained by being that reductive.
From a technical standpoint, it’s not correct analogy either, because it assumes you have a curve to fit. What curve is language? What’s curve is images? No answer, because there isn’t one. Deep learning is about modeling complex behaviors, not curve fitting. Images and language for instance are based in social and cultural patterns and not intrinsic curves to be fit.
At best, it’s an imprecise statement. But I’d disagree entirely.
Highly relevant if you want to work on ML systems. Despite how much OpenAI dominates the press there are actually many, many teams building useful and interesting things.
From an application perspective, it's more important to understand how the overall ML process work, the key concepts, and how things are fitted together. Deep Learning is a part of that. Lots of these are already wrapped in libraries and API, so it's a matter of preparing the correct data, calling the right API's, and utilizing the result.
Someone will dominate the AI as a service marked, but there are so many applications for tiny edge ai that no single player can dominate all of them.
OpenAI is for example not interested in developing small embedded neural networks that run on a sensor chip that real-time detects specific molecules in air.
It's like calculus, nothing new in the last years, is it still important? The answer is still "Yes".
After a glance, looks like too much for one book. Probably it was compressed with the assumption that reader already knows quite a lot. In other words it's not an easy reading.
Maybe last week's drama should have been a left-pad moment. For many things you can train your own NN and be just as good without being dependent on internet access, third parties, etc. Knowing how things work should give you insight into using them better.
I came here with the same question. After reading and learning these materials, will I have new job skills or AI knowledge that I can do something with?