The best part about these types of projects is also the worst: Anything that is java.io.Serializable can be used for keys/values
This means you can immediately save / load / stream your applications objects. Just tack on an "implements Serializable" to you class header (or "implements Externalizable" if you want to be fancy and do it yourself) and you're good to go. Plus with the native Map<?,?> interface writing code against it feels natural.
In practice this also means that you end up serializing arbitrary Java objects and get stuck in serialization/deserialization hell. Your data is stuck tied to a specific format, on a specific platform / language. It's somewhere between impractical and impossible to get it into an agnostic format usable by any other language so you're stuck in JDK land forever.
Anything that involves getting data into some other system or language requires you to also write a Java app to read (and possibly write back) your data.
Do yourself a favor and stick to something language agnostic for your data stores. You'll thank yourself many times over down the road.
Apache Beam just got out of incubator, too, which means that there's now a low cost abstraction on top of two of those (as well as Google Cloud Dataflow). This stuff just seems to get better.
Flink, Spark, Samza, Kafka (Streams). All under the Apache umbrella. All doing similar things. Competitors. There is a risk of spreading resources too thin on the space, there is a risk that some of the projects will be guided towards sub-optimal objectives so as not to compete too directly with others projects in the portfolio, and there is clear branding dilution. This is made worse by the fact that all the projects use the JVM, eroding differentiation even further. And there is concentration risk for Apache making tons of bets in the same space as opposed to diversifying across sectors. Just a lay observation as I happen to have been perusing the space for a financial markets compute graph I am building, and was surprised by the lack of options outside of Apache, and in particular, away from the JVM.
I couldn't quite grok what this was supposed to be from the information on the site. What did help was their one-line description that it was basically a distributed implementation of java.util.stream.
For those who know C#: It's the Java version of LINQ.
For those who don't know Java and C#: All regular collections now offer a .stream() (and .parallelStream()) method that returns the collection as Java 8 Stream<T> instance. With the help of lambda expressions, objects of a stream can be converted/mapped, filtered, sorted, aggregated, reduced, flatmapped, etc. by chaining methods on the stream. You can also create finite or infinite streams without a collection (Like of numbers ranges, random values or anything else you can think of).
I often end up writing complex stream method chains just to break them up in the end, thinking that they are too difficult to understand for others.
This means you can immediately save / load / stream your applications objects. Just tack on an "implements Serializable" to you class header (or "implements Externalizable" if you want to be fancy and do it yourself) and you're good to go. Plus with the native Map<?,?> interface writing code against it feels natural.
In practice this also means that you end up serializing arbitrary Java objects and get stuck in serialization/deserialization hell. Your data is stuck tied to a specific format, on a specific platform / language. It's somewhere between impractical and impossible to get it into an agnostic format usable by any other language so you're stuck in JDK land forever.
Anything that involves getting data into some other system or language requires you to also write a Java app to read (and possibly write back) your data.
Do yourself a favor and stick to something language agnostic for your data stores. You'll thank yourself many times over down the road.