Hacker News new | past | comments | ask | show | jobs | submit login
Hazelcast Jet – In-Memory Streaming and Fast Batch Processing (hazelcast.org)
78 points by martypitt on Feb 12, 2017 | hide | past | favorite | 13 comments



The best part about these types of projects is also the worst: Anything that is java.io.Serializable can be used for keys/values

This means you can immediately save / load / stream your applications objects. Just tack on an "implements Serializable" to you class header (or "implements Externalizable" if you want to be fancy and do it yourself) and you're good to go. Plus with the native Map<?,?> interface writing code against it feels natural.

In practice this also means that you end up serializing arbitrary Java objects and get stuck in serialization/deserialization hell. Your data is stuck tied to a specific format, on a specific platform / language. It's somewhere between impractical and impossible to get it into an agnostic format usable by any other language so you're stuck in JDK land forever.

Anything that involves getting data into some other system or language requires you to also write a Java app to read (and possibly write back) your data.

Do yourself a favor and stick to something language agnostic for your data stores. You'll thank yourself many times over down the road.


Hazelcast is not really an database. Data are 'persisted' in RAM.


> Hazelcast is not really an database. Data are 'persisted' in RAM.

I purposely didn't use the words "database" or "persisted".

My comment applies even more so when data is persisted to durable storage (i.e. disk) but was meant to apply generally to any distributed data stores.


We're quite spoiled on the JVM when it comes to stream processing frameworks. Just a few off the top of my head:

- Apache Spark (especially Spark Streaming) - Apache Flink - Apache Storm - Apache Apex - Apache Samza - Apache Ignite (also includes other things) and now Hazelcast Jet.


Apache Beam just got out of incubator, too, which means that there's now a low cost abstraction on top of two of those (as well as Google Cloud Dataflow). This stuff just seems to get better.


also, note Apache prepended every time. Can't help thinking there is concentration risk there.


I'm hard-pressed to think of a feasible concentration risk for Apache projects. Can you be more specific?


Flink, Spark, Samza, Kafka (Streams). All under the Apache umbrella. All doing similar things. Competitors. There is a risk of spreading resources too thin on the space, there is a risk that some of the projects will be guided towards sub-optimal objectives so as not to compete too directly with others projects in the portfolio, and there is clear branding dilution. This is made worse by the fact that all the projects use the JVM, eroding differentiation even further. And there is concentration risk for Apache making tons of bets in the same space as opposed to diversifying across sectors. Just a lay observation as I happen to have been perusing the space for a financial markets compute graph I am building, and was surprised by the lack of options outside of Apache, and in particular, away from the JVM.


Apache projects are independently managed, not by anybody paid by the ASF. https://community.apache.org/projectIndependence.html


Apache projects are often direct competitors.

The Apache organisation is about good open source governance, not avoiding internal competition.


I couldn't quite grok what this was supposed to be from the information on the site. What did help was their one-line description that it was basically a distributed implementation of java.util.stream.

I didn't know what java.util.stream was either, but this document made it clear: http://www.oracle.com/technetwork/articles/java/ma14-java-se...


For those who know C#: It's the Java version of LINQ.

For those who don't know Java and C#: All regular collections now offer a .stream() (and .parallelStream()) method that returns the collection as Java 8 Stream<T> instance. With the help of lambda expressions, objects of a stream can be converted/mapped, filtered, sorted, aggregated, reduced, flatmapped, etc. by chaining methods on the stream. You can also create finite or infinite streams without a collection (Like of numbers ranges, random values or anything else you can think of).

I often end up writing complex stream method chains just to break them up in the end, thinking that they are too difficult to understand for others.


It's a distributed data processing engine - not specific to just a distributed java.util.stream implementation, that is just one of the uses of it.

It provides fast distributed computation as an infrastructure component - Jet is fully embeddable in your application.

Disclaimer: I am one of the engineers who worked on this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: