Hacker News new | past | comments | ask | show | jobs | submit login
Fusion of Stereo Vision for Pedestrian Recognition using CNNs [pdf] (inria.fr)
79 points by Katydid on April 10, 2017 | hide | past | favorite | 42 comments



Problem: Improve the classification component of an ADAS system to be able to discriminate between the obstacle type (pedestrian, cyclist, child, old person) in order to adapt the car driver system behavior according to the estimated level of risk.

Approach: Train a CNN-based pedestrian classifier using a combination of intensity, depth and flow modalities on the Daimler stereo vision data set. The resulting CNN classifies objects in images as "Pedestrian" or "Non-Pedestrian" (rather than the more granular goal above).

Result: Outcomes are measured by the False Positive Rate (FPR) at the 90% True Positive, or detection rate. Lower FPR=better.

The CNN created in this experiment ("Late Fusion") achieves an FPR of 0.0125%, which is almost 50% better than the 2010 "Early Fusion" CNN's 0.02% FPR over the same dataset.

Still, a 2011 model ("HOG+LBP/MLP, Monolithic HOG classifier") that uses "explicit features" vastly outperforms both, achieving an FPR of 0.00026% over the same dataset.

Conclusion: "We showed that the late-fusion classifier outperforms the early-fusion one but does not go beyond the state-of-the-art."


Wow, thanks for doing this summarization. I wish papers were written more in this clear manner.


This should be the abstract.


Pedestrian Recognition system with the ability to retrain for Gait Recognition [0] [1] would allow systems to track any individuals uniquely. Add to this sentiment analysis[2] and we end up in a very sci-fi realm. This make me curious about how an adversarial system [3] would defeat a system like this. Maybe early on, putting a quarter dollar in my shoes and thinking happy thought is all it will take.

On a side note, I am reminded of an anime called Psycho Pass [4] which uses psychological rating given by a super intelligence to decide who needs to be taken off the streets to be treated.

-------------------------

[0] https://www.newscientist.com/article/mg21528835-600-cameras-...

[1] https://en.wikipedia.org/wiki/Gait_analysis

[2] https://en.wikipedia.org/wiki/Sentiment_analysis

[3] http://www.popsci.com/byzantine-science-deceiving-artificial...

[4] https://en.wikipedia.org/wiki/Psycho-Pass


Then add in some of the face2vec algorithms coming out (takes in an image of a person, outputs a vector that stays roughly stable for the same person in different images).

Imagine the airport of the future where they won't bother with "Mr Smith, please report to Gate 6 for final boarding" while Mr Smith sits oblivious at the bar. Instead it will be "Mr Smith? We brought you your bill early because your flight is leaving in 10 minutes and you should get going" "But I didn't tell you my name or flight..."


Or more realistically 'The probability of Mr Smith making his flight has dropped below x%, based on his body fat estimate, number of drinks, and distance from gate. Sell his seat.'

"Would you like another drink sir?"

Why alert him when they can sell his seat, sell him a few more drinks, a massage, and some entertainment?


And there won't even be any "whistleblowers", since everyone will be able to honestly say they had no idea. The programmers will just plug all the information they have into a generic machine learning system, and ask it to predict the probability of a missed flight for each flyer, based on opaque correlations.


Okay this is freaky.

This is a legitimate reason to worry about AIs. The "world domination" angle seems overblown, but machine learning has the potential to evolve some immoral behaviors like this without anybody even knowing.


There are people talking about those issues: http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.ht...


Ever since they cancelled Person of Interest, I find myself waiting for this future, because I really miss good dystopian AI plotlines.


With strong(ish) AI in your pocket (phone) your personal digital assistant would be pinged by the airport AI, then your digital assistant would remind you get get going.

I imagine this sort of transaction going on a lot in the future. It kinda happens now via email and in browsers. My browser sort of organises and filters all sorts of messages from websites. Also some primitive geo-location also happens... this can only get better as the services mature.


It's great that this can recognize pedestrians, but I've still seen no theory around decision-making. This last week, I happened upon a tree that fell into the road after a bad wind storm. I _know_ a tree when I see it, and I know it is large and heavy and difficult to move and would take a great deal of time for that to happen, so I immediately cut to a route around it. But what if it was a bus or truck turning around? Well I know that a bus turning around is large and heavy and difficult to move, but it will be moving out of the way in a moment, so I can wait for that to happen.

When are we going to start seeing methods for measuring intent or theory around guessing the future nature of an object based on contextual understanding of that object? How does a car recognizing a pedestrian help if the pedestrian starts running toward the car?

It just feels like there's a lot of patting of backs that happens around this stuff, when we really aren't even close until we have a system that has as learning and understanding approach that is as abstract as a human's.


As we move from relying on humans to relying on machines, we'll need to develop better infrastructure to guide those machines.

Our current roads are OK for human drivers, as they are designed heavily around visual signals, we can process and synthesize a lot of visual/audio/learned data well.

Machines can process visual data faster than us, but can't synthesize it with other data nearly as well as we can. Adding sensors that passively communicate information, such as the location of lanes, other cars, and humans, will reduce a lot of the issues and edge cases of visual-based ADAS systems.

I could imagine a future where cars are always listening for smartphones (or other wearable/implanted sensors) in "pedestrian" mode, which could be used to override regular function and stop the vehicle if detected within a certain distance, or intersecting at a certain trajectory.


And when you see a bus driver come across a tree or some other impassable object in the road, the level of decision making as the driver leans out the window, barks orders to other divers to back up. Does a 3 point turn, bumping up the pavement, reversing down someones driveway, scooting over the grass verge while giving a cheery wave to everyone he pissed off while doing it. AI is not there yet.


Agreed. Reminds me of this article: "The state of Computer Vision and AI: we are really, really far away." (2012) [1]

[1] http://karpathy.github.io/2012/10/22/state-of-computer-visio...


Frankly, I'm very terrified at the prospect of being tracked in my every move and it seems entirely unavoidable. License plate scanners are already in wide deployment. There was an article on HN last week about the FBI's 100 million strong facial recognition database using peoples drivers license photos without their consent or knowledge. Not to mention the wholesale tracking of our online activities by the NSA which is already a done deal. I'm seriously thinking about leaving the US to buy myself some time. I'm not hopeful that anywhere will be safe in the end as other countries catch up with this technology. What can be done if this isn't the world you want to live in?


"Unavoidable" is the key thing to realize.

Even if you manage to get your government to never do this, it is (or soon will be) easily within reach for private hobbyists.

My window faces a major freeway. I could probably rig up a license plate reader that registers all cars going by fairly cheaply. If not now, then surely in 5 years.

Everyone has a quite nice camera in their pocket. The car I bought last year comes with 3 cameras. And so on. I think it's an unavoidable fact that in the very near future, you have to assume you are being filmed whenever you're in public.

So I think the real question is: How do we adapt to this new world? How can we limit the bad aspects and enjoy the good aspects? Are we really sure what is bad and good?


Two thoughts:

1) It's a lot easier to recognize 100M distinct plates than faces. I don't see anyone having the resources and patience to do the latter any time soon. Even the FBI isn't claiming they can recognize 1 face/100M with low error today. At best, they have such a database of pictures. I'm have big doubts that usefully matching against so many faces is even technically possible.

2) I think you're right, given the rapid advancement of tech, our emphasis shouldn't be whether we should outlaw surveillance practices and methods. Instead we should focus on controlling access to such databases. Who gets to gather or use such data and when should it be authorized? Should we require that each ID match request be explicitly granted (like a warrant) or should we at least require that all such requests be logged, and perhaps publicly reported?

Once access to sensitive data is regulated, it no longer matters how the data is gathered. Abuse of such a law should poison any fruits of the data, at least on the part of lawful agencies.


Regarding 1, there are systems @ Facebook and inside Google that can do this reasonably well and I know the technology intimately. I would consider this a nearly solved problem, I give it 1 to 2 years at most to be more generally available.


If it truly is entirely unavoidable, then I suppose one way to adapt is to democratize access to these tools and data. The power to abuse is too great if it is concentrated in a small number of hands. Still this seems scary to me. The one solace I take in this is that the gov't has the most dirty laundry to hide and this loss of privacy should help to expose a lot of the corruption going on.


> "Not to mention the wholesale tracking of our online activities by the NSA which is already a done deal. I'm seriously thinking about leaving the US to buy myself some time"

Leaving the USA will actually ensure the NSA collects a lot more about you. Staying will actually ensure they collect very little, mainly as a byproduct of the methods they use and not because of any nefarious plot.


I was more meaning to buy myself some time from the physical real-world tracking. We have no way of knowing what the NSA is actually storing on citizens. I have to assume it is everything and that doesn't seem outlandish in light of the Snowden leaks.


Within the next 3-8 years every new passenger car will be fitted with camera sensors. (Mandated by public safety standards). Most new cars will also be networked. The ECUs on the vehicles may not be powerful enough for real-time biometrics, and the available bandwidth may not be sufficient for this particular application either .. but it wouldn't take much to make it technically feasible. Now., I feel uneasy about the societal risks involved with us going down this route, but I know that there are others who would jump at the chance to exploit this opportunity.


Safety technology can take a long time to be adopted. The vast majority of cars still emit toxic gas for example.


So an ensemble beats a monolith? Would be cool to see if the ensemble with distillation could fit to a more computationally friendly model!


No pedestrian recognition or accident avoidance system is going to be 100% effective. But now we have the opportunity to apply to motoring what we have done to aviation: every rare fatality and the situation that led up to it can be analysed in depth to prevent recurrence.


Next we develop camouflage to avoid detection...


Stuff like this is why I believe that level 4 autonomy within the next decade is quite the lofty goal and level 5 is completely out of the question.

For example, in WA State, you can cross at any unmarked crosswalk which is defined as any intersection (plus a few non-intersections), unless there is a sign explicitly disallowing crossing. I wouldn't know how to accurately estimate this, but there are probably over 100k unmarked crosswalks in Seattle alone (most of which are ignored completely by humans). The pedestrian's responsibility is to not enter the roadway without enough time for a car to stop. The car's responsibility is to stop once the pedestrian signals intent to cross. Signaling intent does not require waiting at the edge of the roadway...merely walking towards the roadway while on the sidewalk is considered signaling intent to cross.

What this effectively means is that the car must do far more than understand pedestrian behavior in crosswalks. It must understand the periphery of the road...not just the pedestrians it can see, but the pedestrians it can't. Most speed limits are actually too high to take into account legal pedestrian behavior. The speed limit is merely an upper limit; driving too fast for conditions can always override the speed limit as a factor in an accident. This means that trees, buildings, etc., that obscure view of and behavior of pedestrians on the sidewalk implicitly lowers the speed at which cars can drive on that road.

Now today we essentially get away with favoring the driver. The driver can assume that they were acting reasonably by driving the speed limit, and we can say that the pedestrian was out of line by walking too fast out into the street, and most situations are resolved blindly based on testimony with no evidence. But the risk model of autonomous cars is different. The manufacturer is gonna be on the hook, not the driver, and the car needs to be able to obey the letter of the law with a higher burden of proof and with dozens of sensors recording the situation.

As we can see from this paper, pedestrian recognition is still a hard problem. It's hard in the engineering sense, and the mathematical sense. But cars don't need to just recognize pedestrians...they need to recognize and understand the context in which a currently invisible pedestrian could appear in the near future. And perhaps more importantly, they still need to solve the problem of instantaneous path planning in a 2d space with a dynamically changing 3d obstruction model consisting of far more than just pedestrians. It's easy to look at the past and say we've made rapid progress in the past so we'll make rapid progress in the future...but in the process of maturing a technology we regularly see exponential progress in the beginning and asymptotic progress near the end. This is going to be very difficult.


Wait, who ate most of the paper?


Automated jaywalking tickets, here we come.


Or better stopping vehicle-ramming terrorist attacks like the one in Stockholm last week. Just imagine a system setup on the vehicle where it detects above-average acceleration in a downtown area (GPS supplied) in addition to abnormal swerving coupled with pedestrian data from the sensor network, the auto-pilot system kicks in and takes over the vehicle bringing it into grinding halt immediately.


I feel like that would be easily circumventable given time and resources.

I imagine most attacks aren't people being like "You know what, I'm sick of life. I'm just going to ram my truck into people"


Right, it's an arms race between us and the terrorists but at this point in time we need to make this weapon turn from low-tech to high-tech to operate thus raising the barrier and make it prohibitively difficult for losers to hijack or drive trucks to kill pedestrians on the street.


It's not an arms race. No amount of user-unfriendliness will ever make it "that" difficult to purposefully use a vehicle maliciously. Terrorist organizations are well funded. As long as vehicles powered by engines or even motors, everything else in the vehicle is gravy. Rip out all the electronics, put in new ones that just fire the cylinders or run the engine. It's not a solvable problem.

https://www.youtube.com/watch?v=HnOPD2MOngU Here is a small engine whose timing is controlled by an Arduino. You vastly overestimate what can be done, and underestimate the intelligence and willpower of terrorists.

All you're achieving is making it harder for everyday people to work on their vehicles.


> but at this point in time we need to

If you want to make every aspect of life feel like commercial air travel does at present, then sure, then play your ever-escalating game of whack-a-mole, but please don't speak as if the conclusion is so foregone. And don't pretend the benefits will naturally materialize (when history has shown that they won't). The cost of such a system is the only part of it that's guaranteed.

"The trouble with you Americans is you think every problem has a solution," it has been observed.


One could also think of the opposite - human-targeting AI vehicles.

I'm sure this is 100% doable right now with existing technology.


Tech is a double-edged sword.

I also could think of anti-human-targeting weapons to stop or neutralize those vehicles.

Everything is possible.


or we could just take the steering wheels away.


Shh.... don't give them ideas.


burqas for everyone, here we come


Please update the fact it's a download in the title.


Here is a more proper link, if anyone wants to avoid going straight to pdf:

https://hal.inria.fr/hal-01501735/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: