No. For the easy 99.999% of driving they keep very little of the training data.
Basically you want to minimize manual interventions (aka disengagements). When the driver intervenes, they keep a few seconds before (30 seconds?) and after that intervention and add that to the training data.
So their training data is basically just the exceptional cases.
They need to just make sure they don’t overfit so that the learned model actually does have some “understanding” of why decisions are made and can generalize.
It's not clear that a bunch of cascaded rectified linear functions will every generalize to near 100%. The error floor is at a dangerous level regardless of training. AGI is needed to tackle the final 1%>
The universal approximation theorem disagrees. The question is how large the network should be and how much training data it needs. And for now it can only be tested experimentally.
The universal approximation theorem does not apply once you include any realistic training algorithms / stochastic gradient descent. There isn't a learnability guarantee.
You said it only depends on network size, I'm saying it more likely is impossible regardless of network size due to fundamental limits in training methods.