So isn't that a deep problem to his FSD architecture?

acchow · on June 15, 2024

No. For the easy 99.999% of driving they keep very little of the training data.

Basically you want to minimize manual interventions (aka disengagements). When the driver intervenes, they keep a few seconds before (30 seconds?) and after that intervention and add that to the training data.

So their training data is basically just the exceptional cases.

They need to just make sure they don’t overfit so that the learned model actually does have some “understanding” of why decisions are made and can generalize.

forgot-im-old · on June 15, 2024

It's not clear that a bunch of cascaded rectified linear functions will every generalize to near 100%. The error floor is at a dangerous level regardless of training. AGI is needed to tackle the final 1%>

red75prime · on June 15, 2024

The universal approximation theorem disagrees. The question is how large the network should be and how much training data it needs. And for now it can only be tested experimentally.

forgot-im-old · on June 15, 2024

The universal approximation theorem does not apply once you include any realistic training algorithms / stochastic gradient descent. There isn't a learnability guarantee.

red75prime · on June 15, 2024

There's no theorem that SGD is insufficient. So, as I said, it's empirical.

forgot-im-old · on June 16, 2024

You said it only depends on network size, I'm saying it more likely is impossible regardless of network size due to fundamental limits in training methods.