Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How FlightCaster Squeezes Predictions from Flight Data (datawrangling.com)
88 points by pskomoroch on Aug 24, 2009 | hide | past | favorite | 16 comments


These guys appear to be having a hell of a lot of fun. Their technique of wrapping a stack from Amazon EC2 through Hadoop all the way up into Clojure, was the kind of thing I wondered about being possible, so it's pretty awesome to hear it is being done and done well by someone. The idea of iterating with Clojure in a REPL on a small dataset to develop or refine an algorithm, then pressing a button and see how it does running on some large dataset on EC2, sounds sublime.

Even if they never release any of the glue code that makes all this happen, just knowing it is possible is very encouraging.


The real world Lisp/Clojure + Hadoop approach is definitely fun to hear about. Also interesting to see a YCombinator team with 8+ people including a domain expert. Mashing up 4+ messy data sources is tough to do. Very unconventional on many fronts.


It's totally fun--we love it. Everyone has a role that they own and we trust each other to execute. Stay tuned--this is just a small slice of what's coming...

Jason (@FlightCaster)


> The idea of iterating with Clojure in a REPL on a small dataset to develop or refine an algorithm, then pressing a button and see how it does running on some large dataset on EC2, sounds sublime.

Unfortunately it usually works out to spending a few hours iterating on the functions in the REPL till it works great and then spending a week battling with Hadoop to make it actually run the way you intended it to. (no exaggeration here.)

Though we've been working with raw Hadoop; it sounds like Cascading makes it much less painful. We'll see how that goes.


True dat. Spoken like a true clown zen master. It wouldn't be clowncomputing if we didn't have red rubber noses and funny rainbow hair.


An “in the trenches” interview on building a machine learning application with Rails & Hadoop. During the interview on FlightCaster, Brad describes some of the challenges of working with flight data, statistical approaches for flight prediction, false negatives in FlightCaster, Clojure, Hadoop & Amazon EC2, YCombinator, and more. Was pleasantly surprised at how open Brad was about the model internals and data crunching pitfalls.


In the article, it mentions Bradford's Amazon wishlist.

For the curious: http://www.amazon.com/gp/registry/wishlist/3RB4REDIKE28I

And those that have been purchased (a better list): http://www.amazon.com/gp/registry/wishlist/3RB4REDIKE28I?rev...


FYI, the purchased books list is not very representative since I buy so many books directly without flowing them through the wish list.

My book lists are better than my wish lists, not just because there is stuff in the book lists that is not in the wish lists, but also because I take time to maintain the book lists and only include the really good stuff.

I'm planning to update my book lists soon with recommendations for statistics, AI, machine learning, and other treasures.

Here are my book lists as of now:

http://www.amazon.com/gp/richpub/listmania/byauthor/A1JKHQFC...


It's great to see someone taking piles of data and pulling some meaning out of it. I think it's easy enough these days to be able to see the potential of data-mining a site like Facebook, but I expect a lot of value to come out of sites like FlightCaster that are getting value in domains that folks don't normally think of being data-intensive. Google was more or less data-mining, but it was a relatively easy set of data to access: public websites. Now, we're seeing the exploitation of more obscure, but not necessarily less valuable, data.


Thanks Hexis. The key for us was being able to combine deep domain expertise with Brad's data-mining capabilities. The model is a nice mix of statistical induction and domain-based logic. We're adding more data sources to it, so both the power of the algorithms and capabilities will only get better.

~Jason (@FlightCaster)


Great interview. It's amazing that they built all of this during the past couple of months. Awesome work guys!


Thanks!


An informative interview providing better understanding of the amazing work this team has done. A well balanced team with great creativity, energy, dedication, perseverance not to mention their awesome talent. Great team work resulting in a quality product. You guys rock!


Much appreciated! Thanks for the kind words. We're really excited to push out a lot more stuff in the next few months...

~Jason (@FlightCaster)


Yes, but are their predictions correct? Anyone tried them out?


Great interview. I'm more curious now as to each of your personal histories. Each of you seem like incredibly gifted domain experts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: