Don't anthropomorphise design. Computers are better at processing huge quantities of data in structured form. Humans are better at pattern recognition and adaption. Trying to design an autonomous robot by emulating the way humans or animals do it is a recipe for bad design.
They are taking the right track today by using a kitchen sink of radar, lidar, gps, visual. This is the way to deliver a self-driving vehicle soonest and safest.
But a self-driving car that uses only visual sensors is clearly possible in the long run. And having that technology would only benefit multi-sensor cars. What if one or more sensors breaks when you are doing 85 mph with the whole family asleep? I'd certainly welcome the resiliency to operate on less input.
Aren't you conflating two issues? Getting visual sensing to the same level as radar/lidar is a great aim. Having redundant multi-modal sensing is a great aim. Switching over to visual-only isn't.
There are too many situations where one type of sensing isn't good enough (e.g. lasers scatter off snow and can't penetrate fog/dust, radar can get saturated by multiple corner reflectors, visual sucks at night, IR sucks in bright sun, etc). To reduce cost visual-only might be a good way to go, but it won't be versatile enough to cover all the necessary scenarios.
I'm not advocating switching over the visual only, unless the other sensors are broken or unavailable as you describe.
I'm just advocating we do the research, create some visual-only cars as proof of concept, solve those thorny AI problems. It's an artificial constraint, one which will produce engineering innovations which can then be applied back to real world products.