I find it kind of interesting that in Haskell, which is arguably even higher lev...

gwern · on Jan 12, 2012

That may be, but what good is it if no one uses it? Hadn't heard that http://hackage.haskell.org/package/git-object or http://hackage.haskell.org/package/ght or http://hackage.haskell.org/package/hit or gat http://evan-tech.livejournal.com/254793.html were especially fast.

dkarl · on Jan 13, 2012

It seems that for almost any popular piece of C or C++ software there are people who are motivated to produce a pure Java implementation. For whatever reason, you don't see that motivation in other language communities. Outside of Java-land, most feature-for-feature copies of existing software seem to be undertaken for the sake of learning or linguistic patriotism, which are not sufficient drivers to sustain such a project to completion.

My guess is that this phenomenon reflects the fact that other language communities have greater comfort and facility with C libraries, or to look at it another way, the fact that complete independence from native libraries is actually a feasible goal for most Java projects.

njs12345 · on Jan 13, 2012

No one uses Haskell, or no one uses those optimisations?

Haskell has plenty of industrial users and quite a few very large programs as well. Those libraries hardly look mature - I know that many of the container libraries on Haskell make extensive use of unpacking, for instance.

Here is a good set of slides on Haskell and optimisation: http://www.slideshare.net/tibbe/highperformance-haskell

kmm · on Jan 12, 2012

I've heard the argument before that in need, one can use a FFI to optimize bottlenecks in high-level code, but I've never understood.

Won't using a high-level language incur an omnipresent speed slump? And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed. The types will have to be unboxed anyway, so whether that happens in foreign code or in the interpreter/JIT code won't matter.

dons · on Jan 12, 2012

At least in Haskell you have unboxed primitive types, memory mapped IO, bump-pointer allocation, and compilation to direct loops that are often identical to what GCC produces (or very close).

nvarsj · on Jan 12, 2012

All of those things also exist in hotspot/Java.

Primitive types have been available since the creation of Java. It's up to the programmer to use boxed types or not.

Memory mapped IO - see Java.nio.

Bump-pointer allocation/compilation to direct loops all exist in hotspot.

andy_boot · on Jan 12, 2012

>>Primitive types have been available since the creation of Java. It's up to the programmer to use boxed types or not.

You can't use a primitive in a collection: eg HashMap / ArrayList

ww520 · on Jan 12, 2012

Can you use GNU Trove or Apache Commons Primitives that support primitive types in collections?

oconnor0 · on Jan 12, 2012

Which is why nifty little libraries like Trove, FastUtil, and HPPC exist.

njs12345 · on Jan 12, 2012

> Won't using a high-level language incur an omnipresent speed slump?

Yes, but most programs don't require high performance everywhere - in a library like JGit for instance, most operations are probably plenty fast written in Java even for very large projects; it's likely only a few are problematic.

> And even if a bottleneck exists, how would using a FFI remedy crucial problems in the language, like the absence of unsigned types or that all types are boxed.

That's maybe an argument to allow more control over memory layout and machine representation in high level languages - although there are ways around this, like defining your data types as a C++ class and then providing a high level binding.

groby_b · on Jan 12, 2012

You'll notice the author works at Google. Assume that the projects are "very large" :)

akg · on Jan 13, 2012

> Won't using a high-level language incur an omnipresent speed slump?

Not sure that is true. Just look at pypy(http://pypy.org/) which claims that run-time optimizations in the interpreted interpreter outperforms the C interpreter, and quite significantly in many cases. So I don't think it's true that high-level languages are always slower. It has a lot to do with the optimizations you can do at run-time. There is also an interesting paper on developing an OS based on run-time code synthesis for optimizing performance (http://valerieaurora.org/synthesis/SynthesisOS/). The major drawback of languages like C is that it can only optimize things at compile-time. I think as projects get larger and we move towards parallel structures and algorithms the need for languages that support run-time optimizations will be greeter.

ced · on Jan 12, 2012

You're right that the FFI can create significant friction, but once you're in C-land, you get C-level performance. So you need to move whole algorithms into C. In a O(n²) algorithm, the O(n) FFI friction will be negligible for a large enough value of n.

like the absence of unsigned types or that all types are boxed

FFIs often provide access to C arrays.

dkarl · on Jan 13, 2012

It isn't always that straightforward. With Java, if you move your code into C you may also need to keep all of your data in C-land to avoid the overhead of copying it back and forth. Then the data is harder to access from Java, plus you can't rely on garbage collection to free that memory when you're done with it.