Heap fragmentation hasn't been a big problem for me. Using multiple JVMs means t...

virmundi · on Aug 7, 2013

You've probably since moved on from this converstation, but I wonder if Tuple Space might help [1]. It provides a distributed memory feel to applications. Apache River provides one such implementation.

Another question about in-memory analytics is do you have to be in-memory? I'm currently working on an analytics project using Hadoop. With the help of Cascading [3] we're able to abstract the MR paradigm a lot. As a result we're doing analytics across 50 TB of data everyday once you count workspace data duplication.

1 - https://en.wikipedia.org/wiki/Tuple_space 2 - http://river.apache.org/index.html 3 - http://cascading.org

fauigerzigerk · on Aug 7, 2013

Thanks for the links. The reason why we decided to go with an in-memory architecture for this project is that we have (soft) realtime requirements and complex custom data structures. Users are interactively manipulating a medium size (hundereds of gigs) dataset that needs to be up-to-date at all times.

The obvious alternative would be to go with a traditional relational database, but my thinking is that the dataset is small enough to do everything in memory and avoid all serialization/copying to/from a database, cache or message queue. Tuple Spaces, as I understand it, is basically a hybrid of all those things.