High performance libraries in Java

SeanLuke · on Feb 11, 2012

Colt has not been updated for eight years, yet the article refers to it in the present tense. Why? Because the article's section on Colt is lifted straight from Colt's website.

This appears to be blogspam.

kevinherron · on Feb 10, 2012

I don't have a solid grasp on what exactly Java Chronicle is and does and what the use cases might be. (https://github.com/peter-lawrey/Java-Chronicle)

Can anybody elaborate?

tlack · on Feb 10, 2012

I'm definitely not a knowledgeable Java dude but it seems to be just an mmap()'d file wrapped in a class. I don't understand how this fits the description the author provided. I also don't understand how he can call it an "in memory database" in the first paragraph and then say "can be much larger than your physical memory size (only limited by the size of your disk)". Any Java kids who have digested those couple code files want to help me understand?

th0ma5 · on Feb 10, 2012

i haven't looked at the code, but i have dabbled in some performance coding... if you can stream things in and out quickly, and in operations that themselves take as little as possible memory, then you can operate on very large files in mostly directly linear time with relation to the size of the file... so while the packet in the stream might be in memory at the time your code runs against it, it must seem to be saying that not only can you do that, but we can also stream files through this thing without having to read the file all at once into memory (which works great and fast on very small files, but can downright non functional for large files) .... just a guess... a lot of performance stuff i've been around is a hot-potato situation, so maybe they're trying to describe that.

rywang · on Feb 10, 2012

I use the Colt (Java) matrix libraries for applied math and graphics applications, but it's still quite costly compared to optimized libraries like Intel's MKL. A singular value decomposition takes about eight times as long in Colt vs MKL.

tdj · on Feb 10, 2012

I've used both Colt and apache commons math in a machine learning setting. It was mostly the better handling of sparse matrix ops that made apache a couple of times faster than colt (due to hash-based vs. sorted list vectors) It was pretty much the same on dense linear algebra. Intel MKL, eigen, uBlas can be better, but I haven't done the benchmarks to prove this either way.

rywang · on Feb 10, 2012

While we're on the subject, I'm curious if you've played with matrix-toolkits-java / netlib-java. I'm considering switching to it. I've had good experiences with the Lawson-Hanson non-negative least squares that comes with netlib.

bedatadriven · on Feb 10, 2012

The netlib-java project is very convenient because it allows you to use optimized native netlib libs, and transparently fall back to the f2j (jvm bytecode) library when native libs are available.

The java version of netlib produced by f2j is unfortunately the unoptimized "reference implementation", so it's not terribly performant for large matrices.

elehack · on Feb 10, 2012

I use the excellent fastutil for primitive collections - it has great integration with java.util collections, and has a more standard design than Trove in my opinion. Works great.

pasbesoin · on Feb 11, 2012

Another borked Blogger template. Cache URL as a "permalink", for those who have Javascript disabled:

http://webcache.googleusercontent.com/search?q=cache:http%3A...

chaostheory · on Feb 10, 2012

I think missing from this list is akka http://akka.io

It's a concurrency framework that allows you to:

1) abstract threads into actors and messages

2) distribute processing with remote actors residing on different machines.

imo, it's way easier to use than traditional java concurrency

ww520 · on Feb 11, 2012

How is Akka compared to Hazelcast in term of distributed processing?

chaostheory · on Feb 14, 2012

I don't know much about Hazelcast, but it seems to just be a group of distributed versions of constructs used for managing threads. That's the key difference. Akka abstracts thread management for you with an Actor model, which imo is much much easier to use.

rkalla · on Feb 10, 2012

The "serialization using ByteBuffers" should be clarified that it is "serialization using direct ByteBuffers" -- as in they exist in the native OS memory space and not inside the JVM's heap.

Direct ByteBuffers are excellent when you can make use of a long-lived, fixed-size buffer that you are using to communicate with a native resource (e.g. socket, file, etc.)

My own experience with using direct ByteBuffers is allocating read/write buffers to a running Redis process that I use to write commends to the server and read the results back. The difference in performance using a direct buffer instead of raw byte[] (basically a standard ByteBuffer) were astounding.

I have seen people argue against the use of direct buffers pointing out that at some point your calls and payload have to cross the JVM-native barrier and using a direct ByteBuffer simply moves the point of entry/exit which won't change the performance of the entire round-trip.

I can't argue with that, but I would point out that in my own work with Redis, having a native process input and output data to and from a native OS buffer that I can then pull into the JVM gave me at least an order of magnitude improvement in speed than sticking with raw byte[] in and out over a socket.

ASIDE: I attribute my success here to the fact that I was able to queue up out-bound commands as fast as possible inside the JVM, pushing them out into the native buffer space which streamed them into Redis; reading back the replies as quickly as possible in a separate thread. My understanding is that by moving the "blood-brain-barrier" to this point, I am allowing Redis to consume and produce as fast as possible as long as I keep the input buffer full and output buffer relatively empty. In other words Redis wasn't being blocked (for the most part) by waiting on me to push and pull data in and out of my running JVM on every single read/write.

ADDENDUM: Just had a fun impl thought for anyone that read this and thought it was interesting... a custom InputStream and OutputStream impl along the lines of the JDK's Buffered streams, but the input and output streams are actually backed by direct ByteBuffers.

The use-cases for the stream would need to be very specific and the underlying approach clearly spelled out in the Javadoc, but it would provide a nice bridge between standard JDK stream-based I/O and the NIO work without burdening the caller with knowing about how to use the NIO APIs.

For anyone interesting, I'll likely add a first-pass impl of this to the Universal Binary JSON Java libs[1] later today to compliment the re-usable ByteArray stream impls that are there already.

[1] https://github.com/thebuzzmedia/universal-binary-json-java

Scaevolus · on Feb 10, 2012

Be careful. The JVM will never unmap the underlying regions it uses to support direct ByteBuffers, so if you end up creating a lot of them (such as for reading/writing files) or even just resizing them (which creates an entirely new one), you can run out of virtual memory.

gorset · on Feb 10, 2012

This is a myth. Or more precisely, a mmaped region no longer in use will be unmapped whenever the JVM feels like it (which can, of course, be never). Try running the loop:

  long size = 0;
  while (size >= 0) {
    size += Files.map(veryBigFile).capacity();
  }

You will see that VSIZE grows to be very big, but will decrease in size whenever the JVM runs the finalizers for the mmaped regions. Adding System.gc() to the loop makes the VSIZE stay constantly low on my machine, proving that at least my JVM (build 1.6.0_29-b11-402-11D50b on a mac) will unmap the underlying regions (JVM is allowed to ignore System.gc()).

gresrun · on Feb 10, 2012

The excellent I/O library, Netty, deals with this by allocating a slab of direct memory, slicing it up and pooling the use of the memory.

It's the Java equivalent of writing your own alloc().

pron · on Feb 10, 2012

That's not exactly what Netty does. It does allocate large direct buffers and slices up pieces, but the pieces are not pooled because they are never returned to Netty. For this reason they also don't need to implement malloc in Java. All the slices reference the "parent" buffer, and once they are all collected, the parent can be collected as well. When there is no more room in the buffer, another one is allocated. (I just read that code last week because I wanted to know what was going on precisely because there was no way of returning a sliced buffer to the pool).

It is very easy (and efficient) to implement java.io input and output streams backed by a buffer (as long as the buffer doesn't have to grow).

gresrun · on Feb 11, 2012

Buffer pooling is still very much a WIP. (the first commit was 3 days ago!)

https://github.com/netty/netty/tree/bufferpooling https://github.com/netty/netty/issues/62 https://github.com/netty/netty/commit/c9968d6cbfa958f73a9868...

pron · on Feb 12, 2012

Oh! Good to know!

rkalla · on Feb 10, 2012

Absolutely right; hence my 2nd paragraph :)