> The idea of iterating with Clojure in a REPL on a small dataset to develop or refine an algorithm, then pressing a button and see how it does running on some large dataset on EC2, sounds sublime.
Unfortunately it usually works out to spending a few hours iterating on the functions in the REPL till it works great and then spending a week battling with Hadoop to make it actually run the way you intended it to. (no exaggeration here.)
Though we've been working with raw Hadoop; it sounds like Cascading makes it much less painful. We'll see how that goes.
Unfortunately it usually works out to spending a few hours iterating on the functions in the REPL till it works great and then spending a week battling with Hadoop to make it actually run the way you intended it to. (no exaggeration here.)
Though we've been working with raw Hadoop; it sounds like Cascading makes it much less painful. We'll see how that goes.