Hacker News new | past | comments | ask | show | jobs | submit login

> you don't often know if your data or model selection can produce the results you want.

Like, not knowing if your data set actually contains anything predictive of what you're trying to predict?




Here’s an example of something similar. Say you have a baseline model with an AUC of 0.8. There’s a cool feature you’d like to add. After a week or two of software engineering to add it, you get it into your pipeline.

AUC doesn’t budge. Is it because you added it in the wrong place? Is the feature too noisy? Is it because the feature is just a function of your existing features? Is it because your model isn’t big enough to learn the new feature? Is there a logical bug in your implementation?

All of these hypotheses will take on the order of days to check.


>AUC doesn’t budge. Is it because you added it in the wrong place? Is the feature too noisy? Is it because the feature is just a function of your existing features? Is it because your model isn’t big enough to learn the new feature? Is there a logical bug in your implementation?

Or is it because lack of expertise and experience and because someone tries stuff blindly without understanding a bit in the hope they will nail it with enough fiddling?


Isn't that all of AI? I get the impression that not even the "experts" really understand what new techniques will get good results - they're guided by past successes, and have loose ideas about why past successes were successful, but can't really predict what else will work.

It seemed like the tremendous success of transformer architectures was a surprise to everyone, who had previously been throwing stuff at the wall and watching it not stick for multiple decades. And when you look at what a transformer is, you can see why QKV attention blocks might be useful to incorporate into machine learning models... but not why all the other things that seem like they might be useful weren't, or why a model made of only QKV attention blocks does so much better than, say, a model made of GRU blocks.


No, it was not a surprise. Transformers architecture resulted from systematic exploration at scale with Seq2Seq. And it was quite clear when this architecture came out that it was very promising.

The issue was not technology, it was lack of investment. In 2017 with a giant sucking sound, Autonomous Vehicles research took all the investment money and nearly all talent. Myself is a good example, I was working on training code generation models for my startup Articoder, using around 8TB of code, scrapped from GitHub. Had some early successes, automatic pull requests generated and accepted by human users, got past YC application stage into the interview. The amount of VC funding for that was exactly zero. I've filed a patent, put everything on hold and went to work on AVs.

As to watching things not stick for multiple decades, you simply had too few people working on this. And no general availability of compute. It was a few tiny labs, with a few grad students and little to no compute available. Very few people had a supercomputer in their hands. In 2010, for example, a GPU rig like 2xGTX 470 (that could yield some 2 TFLOP of performance) was an exception. And in the same year, the top conference, like NeuralIPS had attendance of around 600.


You could say that. No one has even a decade of experience with transformers. Most of this stuff is pretty new.

More broadly though, it’s because there aren’t great first principles reasons for why things work or not.


> Or is it because lack of expertise and experience and because someone tries stuff blindly without understanding a bit in the hope they will nail it with enough fiddling?

So, 99% of software development?


All of these hypotheses will take on the order of days to check.

OK, but you can check them, right? How is that different from a regular software bug?


In software engineering you can test things in something on the order of seconds to minutes. Functions have fixed contracts which can be unit tested.

In ML your turnaround time is days. That alone makes things harder.

Further, some of the problems I listed are open-ended which makes it very difficult to debug them.


I've been an ML researcher for the last 11 years. Last week I spent 3 days debugging an issue in my code which had nothing to do with ML. It was my search algorithm not properly modifying the state of an object in my quantization algorithm. Individually, both algorithms worked correctly but the overall result was incorrect.

Looking back at my career, the hardest bugs to spot were not in ML, but in distributed systems, parallel data processing algorithms, task scheduling, network routing, and in dynamic programming. Of course I also had a ton of ML related bugs, but over the years I developed tools and intuition to deal with them, so usually I can narrow down an issue (like impacted accuracy, or training not converging) fairly quickly. I don't think these kind of bugs are fundamentally different from any other bugs in software engineering. You typically try to isolate an issue by simplifying the environment, break it down into parts, test on toy problems, trace program execution and print out or visualize values.


> In software engineering you can test things in something on the order of seconds to minutes. Functions have fixed contracts which can be unit tested.

I think this only applies to a certain subset of software engineering, the one that rhymes with "tine of christmas".

Implementing bitstream formats is an area I'm very familiar with, and I dance when an issue takes seconds to resolve. Sometimes you need to physically haul a vendor's equipment to the lab. In broadcast we have this thing called "Interops" where tons of software and hardware vendors do just this, but in a more convention-esque style (actually is often done at actual conventions).


> rhymes with "tine of christmas".

What?


line of business


What makes some bugs in ML algorithms hard to spot is that many of then hinder, but do not prevent, the model from learning. They can be really hard to spot because you do see the model learning and getting better, and yet without that bug the predictions could be even more accurate. Only with domain experience you can tell that something might be wrong.

Moreover, this kind of issues are usually related to the mathematical aspect of the model, meaning that you need to understand the theoretical motivation of things and check all operations one by one. Just this week for example I was dealing with a bug there where we were normalizing on the wrong dimension of a tensor.


Only with domain experience you can tell that something might be wrong.

Obviously. How is this different from any other field of science or engineering?

you need to understand the theoretical motivation of things and check all operations one by one.

Again, this is true when debugging any complex system. How else would you debug it?

a bug there where we were normalizing on the wrong dimension of a tensor

If you describe the methodology you used to debug it, it will probably be applicable to debugging a complicated issue in any other SWE domain.


Because the difference is that statistical models are by definition somewhat stochastic. Some incorrect answers are to be expected, even if you do everything right.

In software engineering you have test code. 100% of your tests should pass. If one doesn’t you can debug it until it does.


> How is this different from any other field of science or engineering?

The difference is that in most cases it is not so clear how well any given approach will work in a given scenario. Often the only option is to try, and if performance is not satisfying it is not easy to find a reason for it. Besides bugs or wrong model choice, it could be wrong training parameters, the quality or quantity of the data, and who knows how much more you would need.

It's not necessarily different from SWE, problem solving is a general skill, the difficulty comes from the fact that there is no clear definition of "it works" and that there are no guidelines or templates to follow to find out what is wrong, if anything at all. In particular, many issues are not about the code.


Why do you spend weeks adding something instead of directly testing all the later hypotheses?


In some cases you can directly test hypotheses like that, but more often than not, there isn’t a way to test without just trying.


The Farmer is Turkey's best friend. Turkey believes so because every day Farmer gives Turkey food, lots of food. Farmer also keeps the Turkey warm and safe from predators. Turkey predicts that the Farmer will keep on being his best friend also tomorrow. Until one day, around Thanksgiving, the prediction goes wrong, awfully wrong.


Three body problem, reference,yeh


A bit older than that. This joke predates Bell labs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: