Problem: Improve the classification component of an ADAS system to be able to discriminate between the obstacle type (pedestrian, cyclist, child, old person) in order to adapt the car driver system behavior according to the estimated level of risk.
Approach: Train a CNN-based pedestrian classifier using a combination of intensity, depth and flow modalities on the Daimler stereo vision data set. The resulting CNN classifies objects in images as "Pedestrian" or "Non-Pedestrian" (rather than the more granular goal above).
Result: Outcomes are measured by the False Positive Rate (FPR) at the 90% True Positive, or detection rate. Lower FPR=better.
The CNN created in this experiment ("Late Fusion") achieves an FPR of 0.0125%, which is almost 50% better than the 2010 "Early Fusion" CNN's 0.02% FPR over the same dataset.
Still, a 2011 model ("HOG+LBP/MLP, Monolithic HOG classifier") that uses "explicit features" vastly outperforms both, achieving an FPR of 0.00026% over the same dataset.
Conclusion: "We showed that the late-fusion classifier outperforms the early-fusion one but does not go beyond the state-of-the-art."
Pedestrian Recognition system with the ability to retrain for Gait Recognition [0] [1] would allow systems to track any individuals uniquely. Add to this sentiment analysis[2] and we end up in a very sci-fi realm. This make me curious about how an adversarial system [3] would defeat a system like this. Maybe early on, putting a quarter dollar in my shoes and thinking happy thought is all it will take.
On a side note, I am reminded of an anime called Psycho Pass [4] which uses psychological rating given by a super intelligence to decide who needs to be taken off the streets to be treated.
Then add in some of the face2vec algorithms coming out (takes in an image of a person, outputs a vector that stays roughly stable for the same person in different images).
Imagine the airport of the future where they won't bother with "Mr Smith, please report to Gate 6 for final boarding" while Mr Smith sits oblivious at the bar. Instead it will be "Mr Smith? We brought you your bill early because your flight is leaving in 10 minutes and you should get going" "But I didn't tell you my name or flight..."
Or more realistically 'The probability of Mr Smith making his flight has dropped below x%, based on his body fat estimate, number of drinks, and distance from gate. Sell his seat.'
"Would you like another drink sir?"
Why alert him when they can sell his seat, sell him a few more drinks, a massage, and some entertainment?
And there won't even be any "whistleblowers", since everyone will be able to honestly say they had no idea. The programmers will just plug all the information they have into a generic machine learning system, and ask it to predict the probability of a missed flight for each flyer, based on opaque correlations.
This is a legitimate reason to worry about AIs. The "world domination" angle seems overblown, but machine learning has the potential to evolve some immoral behaviors like this without anybody even knowing.
With strong(ish) AI in your pocket (phone) your personal digital assistant would be pinged by the airport AI, then your digital assistant would remind you get get going.
I imagine this sort of transaction going on a lot in the future. It kinda happens now via email and in browsers. My browser sort of organises and filters all sorts of messages from websites. Also some primitive geo-location also happens... this can only get better as the services mature.
It's great that this can recognize pedestrians, but I've still seen no theory around decision-making. This last week, I happened upon a tree that fell into the road after a bad wind storm. I _know_ a tree when I see it, and I know it is large and heavy and difficult to move and would take a great deal of time for that to happen, so I immediately cut to a route around it. But what if it was a bus or truck turning around? Well I know that a bus turning around is large and heavy and difficult to move, but it will be moving out of the way in a moment, so I can wait for that to happen.
When are we going to start seeing methods for measuring intent or theory around guessing the future nature of an object based on contextual understanding of that object? How does a car recognizing a pedestrian help if the pedestrian starts running toward the car?
It just feels like there's a lot of patting of backs that happens around this stuff, when we really aren't even close until we have a system that has as learning and understanding approach that is as abstract as a human's.
As we move from relying on humans to relying on machines, we'll need to develop better infrastructure to guide those machines.
Our current roads are OK for human drivers, as they are designed heavily around visual signals, we can process and synthesize a lot of visual/audio/learned data well.
Machines can process visual data faster than us, but can't synthesize it with other data nearly as well as we can. Adding sensors that passively communicate information, such as the location of lanes, other cars, and humans, will reduce a lot of the issues and edge cases of visual-based ADAS systems.
I could imagine a future where cars are always listening for smartphones (or other wearable/implanted sensors) in "pedestrian" mode, which could be used to override regular function and stop the vehicle if detected within a certain distance, or intersecting at a certain trajectory.
And when you see a bus driver come across a tree or some other impassable object in the road, the level of decision making as the driver leans out the window, barks orders to other divers to back up. Does a 3 point turn, bumping up the pavement, reversing down someones driveway, scooting over the grass verge while giving a cheery wave to everyone he pissed off while doing it. AI is not there yet.
Frankly, I'm very terrified at the prospect of being tracked in my every move and it seems entirely unavoidable. License plate scanners are already in wide deployment. There was an article on HN last week about the FBI's 100 million strong facial recognition database using peoples drivers license photos without their consent or knowledge. Not to mention the wholesale tracking of our online activities by the NSA which is already a done deal. I'm seriously thinking about leaving the US to buy myself some time. I'm not hopeful that anywhere will be safe in the end as other countries catch up with this technology. What can be done if this isn't the world you want to live in?
Even if you manage to get your government to never do this, it is (or soon will be) easily within reach for private hobbyists.
My window faces a major freeway. I could probably rig up a license plate reader that registers all cars going by fairly cheaply. If not now, then surely in 5 years.
Everyone has a quite nice camera in their pocket. The car I bought last year comes with 3 cameras. And so on. I think it's an unavoidable fact that in the very near future, you have to assume you are being filmed whenever you're in public.
So I think the real question is: How do we adapt to this new world? How can we limit the bad aspects and enjoy the good aspects? Are we really sure what is bad and good?
1) It's a lot easier to recognize 100M distinct plates than faces. I don't see anyone having the resources and patience to do the latter any time soon. Even the FBI isn't claiming they can recognize 1 face/100M with low error today. At best, they have such a database of pictures. I'm have big doubts that usefully matching against so many faces is even technically possible.
2) I think you're right, given the rapid advancement of tech, our emphasis shouldn't be whether we should outlaw surveillance practices and methods. Instead we should focus on controlling access to such databases. Who gets to gather or use such data and when should it be authorized? Should we require that each ID match request be explicitly granted (like a warrant) or should we at least require that all such requests be logged, and perhaps publicly reported?
Once access to sensitive data is regulated, it no longer matters how the data is gathered. Abuse of such a law should poison any fruits of the data, at least on the part of lawful agencies.
Regarding 1, there are systems @ Facebook and inside Google that can do this reasonably well and I know the technology intimately. I would consider this a nearly solved problem, I give it 1 to 2 years at most to be more generally available.
If it truly is entirely unavoidable, then I suppose one way to adapt is to democratize access to these tools and data. The power to abuse is too great if it is concentrated in a small number of hands. Still this seems scary to me. The one solace I take in this is that the gov't has the most dirty laundry to hide and this loss of privacy should help to expose a lot of the corruption going on.
> "Not to mention the wholesale tracking of our online activities by the NSA which is already a done deal. I'm seriously thinking about leaving the US to buy myself some time"
Leaving the USA will actually ensure the NSA collects a lot more about you. Staying will actually ensure they collect very little, mainly as a byproduct of the methods they use and not because of any nefarious plot.
I was more meaning to buy myself some time from the physical real-world tracking. We have no way of knowing what the NSA is actually storing on citizens. I have to assume it is everything and that doesn't seem outlandish in light of the Snowden leaks.
Within the next 3-8 years every new passenger car will be fitted with camera sensors. (Mandated by public safety standards). Most new cars will also be networked. The ECUs on the vehicles may not be powerful enough for real-time biometrics, and the available bandwidth may not be sufficient for this particular application either .. but it wouldn't take much to make it technically feasible. Now., I feel uneasy about the societal risks involved with us going down this route, but I know that there are others who would jump at the chance to exploit this opportunity.
No pedestrian recognition or accident avoidance system is going to be 100% effective. But now we have the opportunity to apply to motoring what we have done to aviation: every rare fatality and the situation that led up to it can be analysed in depth to prevent recurrence.
Stuff like this is why I believe that level 4 autonomy within the next decade is quite the lofty goal and level 5 is completely out of the question.
For example, in WA State, you can cross at any unmarked crosswalk which is defined as any intersection (plus a few non-intersections), unless there is a sign explicitly disallowing crossing. I wouldn't know how to accurately estimate this, but there are probably over 100k unmarked crosswalks in Seattle alone (most of which are ignored completely by humans). The pedestrian's responsibility is to not enter the roadway without enough time for a car to stop. The car's responsibility is to stop once the pedestrian signals intent to cross. Signaling intent does not require waiting at the edge of the roadway...merely walking towards the roadway while on the sidewalk is considered signaling intent to cross.
What this effectively means is that the car must do far more than understand pedestrian behavior in crosswalks. It must understand the periphery of the road...not just the pedestrians it can see, but the pedestrians it can't. Most speed limits are actually too high to take into account legal pedestrian behavior. The speed limit is merely an upper limit; driving too fast for conditions can always override the speed limit as a factor in an accident. This means that trees, buildings, etc., that obscure view of and behavior of pedestrians on the sidewalk implicitly lowers the speed at which cars can drive on that road.
Now today we essentially get away with favoring the driver. The driver can assume that they were acting reasonably by driving the speed limit, and we can say that the pedestrian was out of line by walking too fast out into the street, and most situations are resolved blindly based on testimony with no evidence. But the risk model of autonomous cars is different. The manufacturer is gonna be on the hook, not the driver, and the car needs to be able to obey the letter of the law with a higher burden of proof and with dozens of sensors recording the situation.
As we can see from this paper, pedestrian recognition is still a hard problem. It's hard in the engineering sense, and the mathematical sense. But cars don't need to just recognize pedestrians...they need to recognize and understand the context in which a currently invisible pedestrian could appear in the near future. And perhaps more importantly, they still need to solve the problem of instantaneous path planning in a 2d space with a dynamically changing 3d obstruction model consisting of far more than just pedestrians. It's easy to look at the past and say we've made rapid progress in the past so we'll make rapid progress in the future...but in the process of maturing a technology we regularly see exponential progress in the beginning and asymptotic progress near the end. This is going to be very difficult.
Or better stopping vehicle-ramming terrorist attacks like the one in Stockholm last week. Just imagine a system setup on the vehicle where it detects above-average acceleration in a downtown area (GPS supplied) in addition to abnormal swerving coupled with pedestrian data from the sensor network, the auto-pilot system kicks in and takes over the vehicle bringing it into grinding halt immediately.
Right, it's an arms race between us and the terrorists but at this point in time we need to make this weapon turn from low-tech to high-tech to operate thus raising the barrier and make it prohibitively difficult for losers to hijack or drive trucks to kill pedestrians on the street.
It's not an arms race. No amount of user-unfriendliness will ever make it "that" difficult to purposefully use a vehicle maliciously. Terrorist organizations are well funded. As long as vehicles powered by engines or even motors, everything else in the vehicle is gravy. Rip out all the electronics, put in new ones that just fire the cylinders or run the engine. It's not a solvable problem.
https://www.youtube.com/watch?v=HnOPD2MOngU Here is a small engine whose timing is controlled by an Arduino. You vastly overestimate what can be done, and underestimate the intelligence and willpower of terrorists.
All you're achieving is making it harder for everyday people to work on their vehicles.
If you want to make every aspect of life feel like commercial air travel does at present, then sure, then play your ever-escalating game of whack-a-mole, but please don't speak as if the conclusion is so foregone. And don't pretend the benefits will naturally materialize (when history has shown that they won't). The cost of such a system is the only part of it that's guaranteed.
"The trouble with you Americans is you think every problem has a solution," it has been observed.
Approach: Train a CNN-based pedestrian classifier using a combination of intensity, depth and flow modalities on the Daimler stereo vision data set. The resulting CNN classifies objects in images as "Pedestrian" or "Non-Pedestrian" (rather than the more granular goal above).
Result: Outcomes are measured by the False Positive Rate (FPR) at the 90% True Positive, or detection rate. Lower FPR=better.
The CNN created in this experiment ("Late Fusion") achieves an FPR of 0.0125%, which is almost 50% better than the 2010 "Early Fusion" CNN's 0.02% FPR over the same dataset.
Still, a 2011 model ("HOG+LBP/MLP, Monolithic HOG classifier") that uses "explicit features" vastly outperforms both, achieving an FPR of 0.00026% over the same dataset.
Conclusion: "We showed that the late-fusion classifier outperforms the early-fusion one but does not go beyond the state-of-the-art."