Just curious, how "deep" have you gone into the theory? What resources have you ...

Valk3_ · 2025-03-22T15:58:16 1742659096

I wonder what kind of contributions can you make with a strong math background versus someone with just undergrad math background (engineer)? I know it's a vague question and it's not so cut and dry, but I've lately been thinking about theory vs practise, and feel a bit ambivalent towards theory (even though I started with theory at first and loved it) and also a bit lost, mostly due to the steep learning curve, i.e. having to go beyond undergrad math (CS student with undergrad math background). I guess it depends on what you want to do in your career and what problems you are working on, but what changed my view on theory was looking at other people with little math background or with only undergrad math background at most, that still were productive in creating useful applications and or producing research papers in DL, which showed to me that what is more important is having a strong analytical mind, being a good engineer and being pragmatic. With those qualities it feels like you can go top-down approach when trying to fill in gaps in your knowledge, which I guess is possible because DL is such an empirical field at the moment.

So to me it feels like the "going beyond undergrad math" formally is more if you want to be able to tackle the theoretical problems of DL, in which case you need all the help you can get from theory (perhaps not just math, but even physics and other fields might help as well to view a problem through more than one lens). IMO, it's like casting a wide net, where the more you know the bigger the net is and hope that something sticks. Going the math education route is a safe way to expand this net.

la_fayette · 2025-03-22T19:05:08 1742670308

I also wonder about that, e.g., considering the team behind deepseek, was it more important for them to have great engineering skills vs strong math backgrounds to achieve this success?

godelski · 2025-03-23T00:35:05 1742690105

It's a combination that creates the magic. I'm a big believer in that you need to spend time learning math as well as learning programming and computer architecture. The algorithms are affected by all these things (this is why teams work best. But you need the right composition).

I'm a researcher and still early in my career. I'm no rockstar but I'm definitely above average if you consider things like citations or h-index. Most of my work has been making models more efficient, using fewer resources. Mostly because lack of gpu access lol. My is more on density estimation though (generative modeling)

And to be clear, I'm not saying you need to sit and do calculations all day. But learning these maths is necessary for the intuition and being able to apply that to real world problems.

I'll give a real world example though. I was interning at a big company last year and while learning their framework I was playing around with their smaller model (big one wasn't released yet). While training I recognized it was saturating early on and looking at the data I immediately recognized there were generalization issues. I asked for a week to retain the model (I only had a single V100 available despite company resources). By the end of the week I had something really promising but I was still behind on accuracy of the internal test set. I was convinced though because I can understand what causes generalization and the baked in biases of the data acquisition. My boss was not convinced and I was asking for other test sets and customer data. Begrudgingly it was given to me. I run the test and I 3x'd the performance. Being neck and neck with their giant model that had tons of pertaining (a few percent behind). Dinky little ResNet model beating a few hundred million param transformer. Few hours to train vs weeks. My boss was shocked. His boss was shocked (who was very anti theory). Even got emails asking how I did it from top people. I say that everything I did only works better on transformers and we should implement it there (I have experience with similar models at similar scales). And that's the end of the story. Nothing happened. My version wasn't released to customers nor were the additions I made to the training algorithms merged (all things were optional too, so no harm).

That's been pretty representative of my experience so far though. I can smash some metric at a small scale and mostly people say "but does it scale" and then do not give me the requisite compute to attempt it. I've seen this pattern with a number of people doing things like me. I'm far from alone and I've heard the same story at least a dozen times. The truth is to compete with these giant models you still need a lot of compute. You can definitely get the same performance with 10x and maybe even 100x fewer parameters or lower cost, but 1000x is a lot harder. I'm more concerned that we aren't really providing good pathways to grow. Science always has worked by starting small then scaling. Sure, a lot fails along the way but you have to try. The problem with GPU poor not being able to contribute to research is more gate keeping than science. But I don't think that should be controversial when you look at other comments in this thread. People say "no one knows" as if the answer is "no one can know, so don't try". That's very short sighted. But hey, it's not like there's another post today with the exact same sentiment (you can find my comment there too) https://news.ycombinator.com/item?id=43447616

la_fayette · 2025-03-23T10:27:12 1742725632

Thank you for that insightful comment! "Starting small, research and scale then" is really a pattern often overlooked these days. I wish you all the best for your future endeavours.

godelski · 2025-03-23T20:22:30 1742761350

Haha well it's pretty hard to start big if you don't have the money lol. And thanks! I just want to see our machines get smarter and to get people to be open to trying more ideas. Until we actually have AGI I think it's too early to say which method is going to definitely lead us there

lucasoshiro · 2025-03-22T20:50:06 1742676606

Thanks for your time! Just added your commentary to my favorites! :-)