Oh boy, don't get me started.... I first off should say that by no means do I think any of these people (at least those publishing) are dumb. You can also be a genius in one direction and a fucking idiot in another, and that's okay. Certainly describes me haha (well less on the genius side and more on the functioning idiot side. So take everything I say with a grain of salt). Don't get me wrong, scale is incredibly important and is certainly the reason for our recent advancements. But scale taking us to AGI is fairly naive to me. The idea here has a few clear assumptions being made. First is that the data can accurately explain all phenomena if the machine is capable of sufficient imputation. I just don't even know how to tackle this one because it is so well established as false in the statistics literature. Another is that RLHF is enough for alignment. I like to say that RLHF is like Justice Stewart's definition of porn: I know it when I see it. This is certainly a useful tool, but we shouldn't be naive about its limitations. Just go on any reddit discussion on what constitutes NSFW and you'll find tons of disagreement or even the HN discussions of "Is This A Vehicle"[0]. Those comments are just beautiful and crazygringo (top comment) demonstrates this all perfectly. There's a powerful inference and imputation game going hand in hand and this is the issue. There needs to be more time spent thinking about one's brain and questioning assumptions we've made. As you advance, details become more and more important. We get tricked because you can often get away without nuance in the beginning of studying something but with sufficient expertise nuance ends up dominating the discussion and you might often actually see that naivety doesn't take a step in the right direction but rather can take you a step in the wrong direction (but often moving is more important). I'll reference Judea Pearl and Ilya on this one[1]. Pearl is absolutely correct, even if not conveyed well (it is Twitter after all). His book will give a good understanding of this though.
> What math would you use to describe the limitations of deep learning?
This is hard, because there isn't as much research in it as there is in demonstrations. I wouldn't go as far as saying that there's no work, but it is just far less popular and advancements are slower. Some optimal transport people really get into this stuff as well as people that work on Normalizing Flows. Aapo Hyvarinen is a really good person to read and you'll find foundations for many things like diffusion in his works that far predate the boom. I'd also really suggest looking at Max Welling and any/all of his students. If you go down that path you'll find many more people but this is a good place to enter that network.
But honestly, the best math to get started on to learn this stuff isn't "ML math". It's statistics, probability, metric theory, topology, linear algebra, and many specialized domains within these. I'd even go as far to say that category theory and set theory are very useful. It's all that math that you learn for a lot of other things, but you just need to have the correct lens. There is a problem in math education that we're often either far too application focused or too abstract focused that we forget to be generalist and have that deeper understanding[2]. But this is a lot and I'm not sure of a good single resource that pulls it all together in a way good for introductions (this paper certainly has many of the things I'd mention but it is not introductory). After all, things are simpler after they are understood.
I've written a lot and feel like I may have not given a sufficient answer. There's a lot to say and it is hard to convey in general language to general audiences. But I think I have given enough to find the path you're asking about but just wouldn't suggest you're going to get a complete answer in a comment, unfortunately (maybe someone is a better communicator than me)
[2] I think the theory focused people do often understand this more, but that's usually after going through the gauntlet and likely isn't even seen by them along that journey and especially prior to the point where many people stop. Certainly Terry Tao understands how math is just models and something like "the wave equation" isn't specifically about waves and far more general. You'll also find a lot of breaktroughs where the key ingredient is taking something from one domain and shoving it into another. Patchwork is often needed but sometimes it gets more generalized (or they derive a generalization, then show that the two are specific instances of that general form).
ML researchers saying they need "category theory" sounds like a way to try to convince mathematicians that their work is cool.
You absolutely do not need category theory.
The parent didn't say category theory is necessary to conducting ML research, just that it could be useful. This point isn't particularly controversial. If you're interested in this niche of the field, I find Tai-Danae Bradley's work to be pretty cool! She has a site: https://www.math3ma.com/
Thanks for the reply. I'm glad my comment is no longer flagged.
What do you mean that "this point isn't particularly controversial?" If you just mean that "X may be useful", then of course. But the particular X matters, and "could be useful" is much different than "is useful".
People who like category theory want it everywhere. I don't know your mathematical background, but spend any time in a math department, or even classes, and you'll find people ready to explain any topic in the language of CT.
The may be useful, but it has to be justified. It's clear in some mathematical contexts, but definitely not in ML (yet alone analysis).
ML has a problem in that no one knows what certain methods work. Just look at something like batch normalization: I can think of at least 3 different "explanations" on why it works.
ML people want explanations, and mathematicians need work. Category theorists therefore have work. But I don't think you should mistake this as being an explanation. You just get a nice get a "cleaner way" to present concepts.
FYI, I flagged you because the comment does not live up to the HN community standards[0]. A new account with just a comment to me made shortly after my comment was made just to say something sarcastic and does not contribute to the conversation. I decided to flag instead of commenting and continuing an unproductive exchange.
> People who like category theory want it everywhere.
This isn't surprising. It is an attempt at further generalization of mathematics. Albeit it can get annoying, it isn't wrong because cat theory is about looking from the high abstract level and making connections between differing branches of mathematics. If you don't see it everywhere you either don't have an understanding or have discovered something those people would really like to know. From personal experience, it can be a quite useful tool to describe things because of this.
> The may be useful, but it has to be justified.
The former begets the latter.
> Just look at something like batch normalization: I can think of at least 3 different "explanations" on why it works.
Are those the same thing? What are those?
> But I don't think you should mistake this as being an explanation. You just get a nice get a "cleaner way" to present concepts.
The latter is de facto the former.
And yes, math is just models. Or as Poincaré said, math is the study of relationship between numbers. One might also say "the map is not the territory" and you can find several math theorems making this point explicitly about math. You may even find one by reading my username with a little care. More than one if you take more care.
> If you don't see it everywhere you either don't have an understanding or have discovered something those people would really like to know. From personal experience, it can be a quite useful tool to describe things because of this.
Get off your high horse. I've had my share of Mac Lane. If you can describe something in terms of CT, you can talk to mathematicians who care about CT. I don't see why this helps ML.
> The may be useful, but it has to be justified.
"May be useful" does not beget "justified." CT may be useful in all areas if you ask a CT theorist. I fail to see how CT helps me build a car.
>The latter is de facto the former.
No it's not. You can take you favorite analysis topic and find a suitable category to view your topic from a CT perspective, but this won't tell you how to prove anything. If you did the CT correct you can now make some analogies, but it won't tell you anything specific.
> And yes, math is just models. Or as Poincaré said, math is the study of relationship between numbers. One might also say "the map is not the territory" and you can find several math theorems making this point explicitly about math.
How do you square "math is the study of relationship between numbers" with CT? You can diagram chase without seeing a single number. I have no idea what mathematical theorem you are referring to, but if you're extrapolating philosophical points from a mathematical theorem, you're doing it wrong
> You may even find one by reading my username with a little care. More than one if you take more care.
Ok I'll bite. You seem to be into Normalizing Flows. How does CT explain it being useful?
I'm trying not to dox myself so I can be more open on HN (though more concerns in modern era...). You can find some harsh words against some ML community practices in my history and I think it is easy to get misinterpreted as calling people dumb or confuse academic bashing from utility (I criticize LLMs and diffusion a lot because I like them, not the other way). So yes and no. But the lectures I have aren't recorded and public (zoom for my Uni. I'm ABD in my PhD). My lecture slides and programs should be publicly visible though, but I don't go into this with them because I've been specifically asked to not teach this way :/ In all fairness, our ML course only has Calc 1 as a pre-req and CS students aren't required to take Lin Alg (most do though, but first courses are never really that great ime) or differential equations. TBH to get into this stuff you kinda need some metric theory. If you actually poke through this paper you'll find that come up very quickly, and this is common in the optimal transport community. But I think if you get into metric theory a lot of this will make sense pretty quickly. So if you can, maybe start with Shao's Mathematical Statistics?
Oh boy, don't get me started.... I first off should say that by no means do I think any of these people (at least those publishing) are dumb. You can also be a genius in one direction and a fucking idiot in another, and that's okay. Certainly describes me haha (well less on the genius side and more on the functioning idiot side. So take everything I say with a grain of salt). Don't get me wrong, scale is incredibly important and is certainly the reason for our recent advancements. But scale taking us to AGI is fairly naive to me. The idea here has a few clear assumptions being made. First is that the data can accurately explain all phenomena if the machine is capable of sufficient imputation. I just don't even know how to tackle this one because it is so well established as false in the statistics literature. Another is that RLHF is enough for alignment. I like to say that RLHF is like Justice Stewart's definition of porn: I know it when I see it. This is certainly a useful tool, but we shouldn't be naive about its limitations. Just go on any reddit discussion on what constitutes NSFW and you'll find tons of disagreement or even the HN discussions of "Is This A Vehicle"[0]. Those comments are just beautiful and crazygringo (top comment) demonstrates this all perfectly. There's a powerful inference and imputation game going hand in hand and this is the issue. There needs to be more time spent thinking about one's brain and questioning assumptions we've made. As you advance, details become more and more important. We get tricked because you can often get away without nuance in the beginning of studying something but with sufficient expertise nuance ends up dominating the discussion and you might often actually see that naivety doesn't take a step in the right direction but rather can take you a step in the wrong direction (but often moving is more important). I'll reference Judea Pearl and Ilya on this one[1]. Pearl is absolutely correct, even if not conveyed well (it is Twitter after all). His book will give a good understanding of this though.
> What math would you use to describe the limitations of deep learning?
This is hard, because there isn't as much research in it as there is in demonstrations. I wouldn't go as far as saying that there's no work, but it is just far less popular and advancements are slower. Some optimal transport people really get into this stuff as well as people that work on Normalizing Flows. Aapo Hyvarinen is a really good person to read and you'll find foundations for many things like diffusion in his works that far predate the boom. I'd also really suggest looking at Max Welling and any/all of his students. If you go down that path you'll find many more people but this is a good place to enter that network.
But honestly, the best math to get started on to learn this stuff isn't "ML math". It's statistics, probability, metric theory, topology, linear algebra, and many specialized domains within these. I'd even go as far to say that category theory and set theory are very useful. It's all that math that you learn for a lot of other things, but you just need to have the correct lens. There is a problem in math education that we're often either far too application focused or too abstract focused that we forget to be generalist and have that deeper understanding[2]. But this is a lot and I'm not sure of a good single resource that pulls it all together in a way good for introductions (this paper certainly has many of the things I'd mention but it is not introductory). After all, things are simpler after they are understood.
I've written a lot and feel like I may have not given a sufficient answer. There's a lot to say and it is hard to convey in general language to general audiences. But I think I have given enough to find the path you're asking about but just wouldn't suggest you're going to get a complete answer in a comment, unfortunately (maybe someone is a better communicator than me)
[0] https://news.ycombinator.com/item?id=36453856
[1] https://twitter.com/yudapearl/status/1735211875191910550
[2] I think the theory focused people do often understand this more, but that's usually after going through the gauntlet and likely isn't even seen by them along that journey and especially prior to the point where many people stop. Certainly Terry Tao understands how math is just models and something like "the wave equation" isn't specifically about waves and far more general. You'll also find a lot of breaktroughs where the key ingredient is taking something from one domain and shoving it into another. Patchwork is often needed but sometimes it gets more generalized (or they derive a generalization, then show that the two are specific instances of that general form).