The taxi data just has startpoint and endpoint correct?
So the route in between each trip is just a guess?
Edit: Whoops, from the about page:
> The raw data include only start and end locations for each trip. These points were run through Google's Directions API to create the routes shown in this visualization. Of course, these are Google's best choice, not necessarily the one the taxi took.
I was surprised at how inefficient some of the drivers' behavior was. I thought after dropping off a fare they would all immediately head to an area they knew of that might have a high likelihood of generating fares. Instead many of them seem to just wander around aimlessly looking for the next fare.
From the page:
Empty Taxis also follow the "best route" between a dropoff and the next pickup. Just as with the trips, this is just an effective way to move the marker around, but doesn't reflect the reality of where the taxi traveled.
It's also the case that taxi drivers need to eat, use the toilet, do their shopping, see the sights, etc. They aren't robots, and they don't necessarily have to dash to the next customer in a mad scramble for survival.
Imagine if somebody made a visualization of your workday and put it out in public. Would your manager say "he seems to just wander around aimlessly instead of efficiently moving from one piece of code to the next. I thought after compiling one project, they would all immediately head to the next one."
The analogy is ok, but mostly fails because as a developer, we are not paid commission for each line of code written. Whereas taxi drivers have a very strong profit motive to act efficiently and quickly. I think your examples of why they would deviate from this believed behavior is pretty solid on its own.
Seems like this data could be used to build an app that suggests where a taxi driver should go at any given time to maximize their chances of getting a fare. Uber & Lyft already do this for their drivers, but I'm not aware of any app that does this for NYC yellow cabs.
Most taxi drivers already know the right time/areas to go to get hailed, and to get the kind of hails they want(IE not to JFK or LGA, the bronx, most of queens/BK). Even though it is just tribal knowledge, I'd be really surprised if it wasn't largely accurate.
Long drive, capped rates as already mentioned. Also once there you cannot immediately pick up another passenger - you have to go to the central "taxi pool" where you get to line up with dozens/hundreds of other cabs for the privilege of picking up a fare.
So as a driver your choices are to wait forever in a line to pick up a fare from the airport, or beeline it back to Manhattan as quickly as you can with an empty car. Neither are great choices.
Although in the wee hours (between 1 and 6AM), getting a hail to JFK is not bad, because it only takes ~20 minutes to get there, and yields a $60 fare. But during rush hour, getting stuck going to JFK is the worst thing possible, as you lose ~2-3 hours of the best fare times.
Which is why you should use the AE/Penn Station train during rush hour (it'll be faster than a cab), or tip your cabbie very well (30%+), though it still ends up being sucky for them and stressful for you.
Damn, I wish we had that in the Bay Area. I usually take Uber or another black car service from SFO because to Oakland in a taxi it's $90+ on the meter and lots of extra fees.
An UberX driver said the same to me and I was surprised as well, but after thinking about it, it started to make more sense. What could Uber really tell them beforehand that they wouldn't already know, or learn very quickly from driving around?
The driver thought Uber didn't provide the information because they want to be able to charge surge pricing, but that cannot be right. Uber doesn't want to charge surge prices unless they absolutely have to because they reduce the number of rides people take.
Considering how much money Uber invests in giving away free rides and temporarily reducing prices to increase ridership, they clearly see a lot of value per ride, especially for travelers who are more likely to be at an event, try Uber for the first time, and then bring that demand home to their own cities.
Surge pricing only makes sense in situations where drivers need additional incentive to go out in the first place.
I wonder why my guy wandered really slowly up Riverside Ave for like 30 minutes instead of going inland to find a fare. Looks like it's really slow between 11am and 3pm, and constant driving otherwise.
I was wondering the same thing, but as they just average the location from the last dropoff to the next pickup, it's almost definite that the driver "went to lunch" in the interim. I saw this trend in the two taxis I looked at.
Taxi drivers don't usually stop for dinner when on shift. The closest they will do is run into a bodega, or stop on the corner and have a streetfood vendor hand them something into the cab.
I've seen halal carts where the taxis line up down the block and a guy takes their order on the south end of the block and they pick it up at the north end, just like a drive-thru burger joint.
Only has data for begin/end points of the trips. They're just using google directions to move between the sampled data points - they don't have minute by minute gps location of the cab.
Could've been raining that day. Or it may be US/NY specific... I have only been to west coast, but coming from Europe I was really surprised what's considered a "walking distance" in the city centre. 2 blocks away and people were ready to get a cab.
Then again, the cabs were probably 5+ times cheaper than what I was used to in the UK. Maybe I'd use them more if it cost me $5, not £10 to go around 10 blocks away. One is close to spare change, the other I'd have to think about...
10 blocks is very much considered walking distance in NYC. People walk here all the time - it's the default mode of transport to nearby places. Cabs are usually used for specific reasons: transporting something heavy, bad weather, rider has poor mobility (injured leg, etc.), had a lot to drink, entertaining/working with a client, etc.
This isn't true in the rest of the US, however. The rest of the country is less walkable and the infrastructure is designed for cars first and foremost, so people learn to default to cars even for nearby trips. When I lived in California it was hard to get people to walk anywhere, even if it was only 5-10 minutes away.
It'd be interesting if comparable data could be gotten for different cities. Wonder what the curves would look like for e.g., Boston, SF, London, Athens, Frankfurt.
Small nit, but guessing this is crow's flight distance which doesn't equal path distance. Regardless, this still isn't surprising. Short rides happen all the time.
How is the data input on # of passengers? Could it be that it's just easier to hit "1 person" all the time regardless? Are there cabbies that always have the same #?
Loads of tourists are terrified of walking anywhere in NYC both due to fear of crime and fear of getting lost, and thus take cabs for distances that probably don't warrant them.
One interesting observation is that the mean fare is ~ $10. That's the same as 4 subway rides (each one with unlimited transfers and can take you from one end of the city to the other). You need to be pretty rich to be riding a taxi regularly nowadays. Taxi fares have gone up at a far greater rate than subway fares.
Of course, you can also share a taxi... it may be an indulgence to hop on a taxi alone as part of your daily commute, but it becomes much more convenient, private, and cost-effective when in a pair or small group.
These taxi data do not seem to capture taxi capacity factor / number of seats filled.
First of all... this is really great. Really good job.
On the ipad a few things are wonky with the layout, you may want to test on there and fix the few UI issues.
It would be cool to see $/hr precalculted as well. It may be better to make the right side a table with each row being a ride and then totals at the bottom too.
From a brief look at the data it looks like cash tips might be counted as 0%. So that 12% average might be originally closer to 25%, but brought down by the nulled out cash tips.
What's happening when I see a green blob which then turns red a few seconds later? The cab hasn't moved, but the passenger count and fare tally have both gone up. The cab then moves at a normal pace from that location to the next fare, so I don't think it's missing GPS data.
Edit: just saw another comment saying the points were calculated through a Google API, so maybe it's just gremlins in that.
I watched a few different cabs and they all seemed to generate around $600 in a 24 hour period. Am I the only person who thinks this is astonishingly low?
They lose a lot to fees, car stuff, gas, variable customer, routes people want them to take (and time to find a new fare) etc. They probably make around $25 an hour. I'm not sure what that looks like in New York but it sounds alright.
I was chatting with a UberX driver yesterday in NYC and apparently his insurance is $7K a year (it's a Toyota Camry too, nothing super fancy).
That works out to ~$27 a day if you drive the car 5 days a week, every week of the year. Not a massive amount, but significant in the numbers we're talking about.
The demand for taxis is so high that a driver bluntly told me to give him my cell phone and setup GPS route, otherwise hes not going. And that was a good 2-burrow trip for $40. Bottom line; they don't need you -- you need them.
Does not work in Safari. Come on people, if you're using some non-standard or cutting-edge feature, at least use Modernizr to detect it and tell users instead of just having a completely non-functional "begin" button.
So the route in between each trip is just a guess?
Edit: Whoops, from the about page:
> The raw data include only start and end locations for each trip. These points were run through Google's Directions API to create the routes shown in this visualization. Of course, these are Google's best choice, not necessarily the one the taxi took.