Your brain is _really really_ good at surmounting challenges including many that you did not mention. We don't know how to get close to this in terms of reliability when using cameras and ML alone. Cameras and ML alone can go very far, but every roboticist understands the problem of compounding errors and catastrophic failure. Every ML person understands how slow our learning loops are.
Consider that ML models used in the field have to get by with a fixed amount of power and ram. If you want to process time context of say 5 seconds, and with temporal context 10Hz and with resolution 1080p, how much data bandwidth are you looking at? Comparing what you see with your eyes with a series of 1080p photos, which is better? Up it to 4k: how long does it take to even run object detection and tracking with a limited temporal context?
Your brain is working with more temporal context, more world context, and has a much more robust active learning loop than the artificial systems we're composing today. It's really impressive what we can achieve, but to those who've worked on the problem it feels laughable to say you can solve it with just cameras and compute.
There are plenty of well respected researchers who think only data and active learning loops are the bottlenecks. In my experience they're focused on treating the self driving task as a canned research problem and not a robotics problem. There are as many if not more respected researchers who've worked on the self-driving problem and see deeper seated issues -- ones that cannot be surmounted without technologies like high fidelity sensors grounded in physics and HD maps.
Even if breadth of data is the problem and Tesla's approach is supposedly yielding more data -- there is also the question of the fidelity of said data (e.g. the distances and velocities from camera-only systems are estimated and have noiser gaussians than ones generated with LiDAR). If you make what you measure, and your measurements are noisy, how can you convince yourself or your loss function for that matter that it's doing a good job of learning?
It's relatively straightforward to build toy systems where subsystems have something on the order of 95% reliability. But robotics requires you to cut the tail much further. https://wheretheroadmapends.com/game-of-9s.html
Your brain is _really really_ good at surmounting challenges including many that you did not mention. We don't know how to get close to this in terms of reliability when using cameras and ML alone. Cameras and ML alone can go very far, but every roboticist understands the problem of compounding errors and catastrophic failure. Every ML person understands how slow our learning loops are.
Consider that ML models used in the field have to get by with a fixed amount of power and ram. If you want to process time context of say 5 seconds, and with temporal context 10Hz and with resolution 1080p, how much data bandwidth are you looking at? Comparing what you see with your eyes with a series of 1080p photos, which is better? Up it to 4k: how long does it take to even run object detection and tracking with a limited temporal context?
Your brain is working with more temporal context, more world context, and has a much more robust active learning loop than the artificial systems we're composing today. It's really impressive what we can achieve, but to those who've worked on the problem it feels laughable to say you can solve it with just cameras and compute.
There are plenty of well respected researchers who think only data and active learning loops are the bottlenecks. In my experience they're focused on treating the self driving task as a canned research problem and not a robotics problem. There are as many if not more respected researchers who've worked on the self-driving problem and see deeper seated issues -- ones that cannot be surmounted without technologies like high fidelity sensors grounded in physics and HD maps.
Even if breadth of data is the problem and Tesla's approach is supposedly yielding more data -- there is also the question of the fidelity of said data (e.g. the distances and velocities from camera-only systems are estimated and have noiser gaussians than ones generated with LiDAR). If you make what you measure, and your measurements are noisy, how can you convince yourself or your loss function for that matter that it's doing a good job of learning?
It's relatively straightforward to build toy systems where subsystems have something on the order of 95% reliability. But robotics requires you to cut the tail much further. https://wheretheroadmapends.com/game-of-9s.html