Hacker News new | past | comments | ask | show | jobs | submit login

I think it is.

The current generation of neural networks already seem to do pattern recognition better than humans. And that is all that is needed to press the brake in a dangerous situation. All that is needed is to train them on enough examples of different situations that occur on the road.




I think, as Cruise found out, you can't have priors for everything (eg. don't drag a pedestrian you ran over down the road).


You only need more priors than humans have in their mind.

As I said, neural nets are already good at pattern recognition. That is all that is needed to recognise you ran over someone.

Humans are still better at long reasoning sessions. Aka "thinking something through". But that is not involved in recognizing you ran over someone.


You think a human wouldn't drag another human under their car because they've pattern recognized that from a prior example?


Not just one example. They learned from many examples that dragging someone or something along is not a good thing.


I've seen examples of humans beings shot out of cannons where they were perfectly fine afterwards and the crowd applauded. I haven't seen any examples of people being shot out of cannons where they were not fine afterwards. Should I shoot people out of cannons?


What's your point?


That understanding is not the same as pattern matching and if that's all you rely on to drive your car you're going to end up dead or seriously injured.


Even when pattern matching is your only tool and you have seen only positive outcomes of humans beings shot out of cannons, you will still not want to shoot people out of cannons. Because it matches so many negative patterns. You learned that explosions are bad, cannons are bad, weapons in general are bad, loud bangs are bad, people flying through the air are bad etc.

So this is not an example of pattern matching being insufficient to avoid danger.


Even if this were a given (and it isn’t), I think it’s unlikely Tesla robotaxis capture the transport market like that.

There’s a lot of cars to replace. Capturing 10% of yearly car sales, worldwide, is already a very tall order.

Capturing 10% of all cars on the road would require this level of gargantuan sales for many years, and that’s just so 10% of cars in the world are your brand, not so 10% of rides are on cars owned by the fraction of people who bought your brand and decided to let it drive strangers while they’re not using it.

Moreover, beyond the safety concerns there’s policy concerns— knowing how to safely drive in Indonesia doesn’t mean being allowed to drive in Indonesia. Resource constraints— where is all the lithium going to come from, and how will the people around the world feel about this?

And there already are alternatives beyond the car industry. Public transit, for example. Demand for not driving or not owning a car can’t only be safely fulfilled with robotaxis.


I said 10% of rides will be robotaxis and 10% of that will be in Teslas.

That is 1% of all rides taking place in Tesla robotaxis, not 10%.

Also it is about rides, not about cars. Less than 1% of cars being Tesla robotaxis is enough for 1% of rides to be Tesla robotaxi rides. A robotaxi will do multiple times more rides per day that the average car.


>All that is needed is to train them on enough examples of different situations that occur on the road.

The only real way this would work is if you build a pipeline where you have a simulated world with realistic physics, rendered through some pipeline to represent camera images (in relatively high fidelity) and/or lidar outputs, and teach the network to predict the evolution of these scenes, and do this with a shitload of random data

This is probably orders of magnitude more expensive to do than training GPT models, since for vision, you would have to render it with ratyracing.


Have you not seen the results that have been shown recently with Gaussian splatting? Photorealistic rendering looks like it'll become a hell of a lot cheaper than raytracing soon enough, and this might actually be a realistic idea.


I dunno if statistical methods can be applied to generating photorealistic images from simulation, where you don't have enough information in a scene, since you are starting with 3d models.


Human drivers are proof that this is not needed.


I hate that companies in the vision self driving space like Tesla and Comma AI have made the act of driving seem like its trivial.

Human drivers drive with EXCEPTIONALLY more "software" under the hood. We simulate the world around us using learned rules across multiple domains. For example, if someone has never driven at night, but understand that the cars have tail lights that are red, we can deduce that pairs of red lights are cars on the road without ever being subjected to the visual of driving at night. Its this kind of processing that lets us be safe.

In the current form, ML models are way subpar to this. A superhuman driving agent should be able to drive through a construction site with cones and debris, or a busy parking lot, or a grass field full of logs. No model out there can do this. This is why you absolutely need simulations if you want to do it traditional way. Comma AI has gained quite a bit of success in training in a simulated environment that simulates deviations from straight line driving on real world video.


    we can deduce that pairs of red lights are cars on the road
Generalization, abstraction, understanding. That is what neural networks are about. Current NN's are already very good at that. FSD constantly has to understand lighting situations it has not seen before.


You still need training data.

Even if you had a simulation set up for training, and enough compute on the vehicle to run something like MuZero real time inference for self driving, you would still run into the problem of humans setting up all the possible driving scenarios, which will likely leave some out and the model will never learn the right actions for those - meanwhile, to any human, the action would be very obvious. Over time you could probably get close enough to a very very small error rate, but you would still be hesitant to trust the system.

A true superhuman driving agent would most likely be able to take a picture of a scene, and then give a prediction of evolution of object position in 3 dimensions for a given time window, and give a confidence score on that. For example, if the single image is from few on a highway, it should be able to predict that cars are in fact moving, because the chance of cars standing still on a highway is very low.

And to train a model like that, you most likely need a base model that can "understand" physics.

Additional images from the past or the future would generate prediction that is more accurate, and also improve the model self assessed confidence. And then you would have some heuristic algorithm like MCTS based on confidence levels on the best course of action.


It's not just objects, but it has to predict what people are doing, going to do, or are trying to communicate. For example, police redirecting traffic, a passenger having a medical emergency, other drivers driving erratically, etc.


And current generative AI has jack all to do with humans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: