I've been wondering for awhile if the JVM really has a lot of runway left; Java 7 has been amazingly slow in coming, and it'll be at least six months, and more likely a year, before even the early adopters can really use it in production.
Startup time is a very real problem, as is memory use compared to other, similar VMs (V8, LLVM).
With closures (lambda expressions), Java will be a lot more useful, but my money is honestly on V8 and JavaScript at this point. It's moving faster, and the node.js guys are very right that JavaScript is the language of the web.
The JVM does a huge amount more than V8/node.js. If your world consists solely of shuffling bytes around a network then node.js may be a good solution, but it doesn't extend much further than that. For example, I wouldn't want to write a machine learning algorithm in Javascript, nor would I want to write a storage engine, nor a ... you get the idea. Furthermore, while node.js may be growing quickly (easy when you're small!) the development of the Javascript language is taking a rather torturous route as the various vendors play games at the Ecmascript table.
To return to the article, what I really want from both the JVM and Javascript is tail recursion. I'd also like proper lexical scoping in Javascript, though that isn't so important. I don't view Java or Javascript as languages you write but rather languages you compile to (well, you compile to bytecode for the JVM, but hopefully you get my point). Scala and Clojure make mighty fine Java replacements; I haven't seen anything yet for Javascript that is much of an improvement.
Could you list some reasons why you wouldn't want to write a machine learning algorithm in Javascript, or a write storage engine nor a ...? Lack of libraries? Too high level? Or do you not like the syntax/semantics/...?
1) Javascript is the slowest moving language standard since Common Lisp
2) Node.js is a cool project that is heading very gradually towards adoptability for projects that aren't tech demos
3) JVM is mature enough to support other languages - can't say that about V8 and probably not for some time. JVM languages moving fast and in good/interesting directions.
4) Android
5) JVM startup time not an issue for web services, games, etc - only one-off shell scripts.
6) Memory use. Unless you're running on a mobile platform - who cares?
Projects like Node.js will have a bright future with small agile teams. But no more so than Clojure or Scala. And unlike Node.js, I think Clojure and Scala certainly have the ability to scale up to heavy-lifting tasks that require concurrency, real perf, and mature libraries.
I love JS in the browser. But on the server, I have yet to see anything that convinces me that JavaScript will ever really trump what the JVM offers.
just for the record, V8 differs in one very important aspect from the JVM: it includes no interpreter. It directly compiles methods to native code before their first execution, whereas the JVM includes a highly optimized interpreter for the first couple of executions. This affects memory usage, too (a couple of things that come to mind are profiling, additional memory for de-optimization, etc.)
I have a hard time believing that V8 doesn't do profiling / de-optimizations ... as these are crucial parts for optimizing runtime dispatch or boxing/unboxing of primitives.
I know that V8 doesn't do tracing compiling, but still, it should at least have inline caches.
Why is startup time relevant if you're doing backend server stuff? It's pretty much the ideal use case for Java.
Java 7? meh. If you like chasing new shiny things all day, sure.
Javascript is a great language, no doubt, but for me, it lacks a lot of the things Java does so well to ensure that things don't break, finding bugs easy, etc etc
Why are so programmers so fashion conscious though? Do we write programs to solve problems, or do we write programs to be fashionable.
If we write to solve problems, then use Java5. Use BASIC. Who cares if it solves the problem well. If you write to be fashionable, then sure, use <NEW SHINY THING THAT REINVENTS THE WHEEL BUT ISN'T UP TO MUCH YET>. People will marvel over your great fashion sense, but will it solve the problem even as well as a less fashionable approach?
The amount of man hours wasted on reinventing the wheel in programming languages is staggering. So much time spent making a saw with a more comfortable handle that doesn't cut as well.
> Why is startup time relevant if you're doing backend server stuff? It's pretty much the ideal use case for Java.
Even on "backend server stuff" I pretty much prefer the FastCGI model, with multiple processes handling requests, as I don't have to worry about memory-leaks or about a rogue thread that can take out my whole server.
With the JVM, this is impossible ... you need a monolithic process that takes care of everything (no support for forking, long startup time, high resource usage). And having multiple apps running in their own JVM (so they don't interfere with one another) is no panacea.
You can also look at the problems Google's App Engine has with Java apps ... as resources are elastic, extra servers / VMs are loaded on demand, and the first requests hitting such a cold VM almost always ends up with a timeout error (although they are working to fix this, the magnitude of this issue is not the same with Python on GAE).
And this is solely because of the slow warm-up time.
> People will marvel over your great fashion sense, but will it solve the problem even as well as a less fashionable approach?
I have ideas that I can prototype over a single weekend, which wouldn't be possible without the help of those people with "great fashion sense".
> The amount of man hours wasted on reinventing the wheel in programming languages is staggering
Why don't you teach people then how to spend their time better with constructive arguments?
The easiest thing in the world is to dismiss / critique.
GAE has a lot more at play than standard JVM startup/warmup. JRuby on a desktop "client" Hotspot Java 6 starts up in around 0.5 seconds. On App Engine it can take as much as 10-15 seconds. Why? Because App Engine has a very different environment with a virtual filesystem, heavily-locked-down security, and (probably) quite different memory and CPU resource allocation. GAE does have issues with startup, but they're largely due to design decisions the GAE team has made, and not due to flaws in the JVM itself.
That said, I appreciate what GAE is: the first fully-sandboxed, elastically-scaling Java webapp environment. The decisions they've made seem to be good ones, ignoring the overhead they introduce.
Hey hedius, forgot to say in this thread ... thanks for your work on JRuby. It works great.
One small suggestion ... would you consider adding extra-extensions (breaking backward compatibility with Ruby)? You've already done it here and there, like for threads or for string processing (although I guess it wouldn't have made sense otherwise).
What I'd like is for some way to get the AST of a anonymous code block ... as I'm dreaming of something like Linq in a JVM language.
I'm not opposed to anything that seems like it might help bring more people to JRuby (and by extension, to Ruby). You can already access most of JRuby's internals via our Java integration layer, and that includes the AST, various core types, class structures and method tables, and so on. So much of this you could probably prototype without us adding anything. Beyond that, I'm interested in possible enhancements in JRuby to "over-optimize" code (like opting out of certain Ruby behaviors in a limited scope if they're just overhead for your app), and we're obviously always trying to improve how JRuby integrates with Java itself.
So yes, there are many great plans...and user demand (especially paid users!) can make anything happen.
> "Why don't you teach people then how to spend their time better with constructive arguments?"
Yeah but it's like trying to tell people "Hey don't buy that prada coat for £500 that doesn't actually cover you properly from the rain, buy this old fashioned raincoat instead that actually functions well as a raincoat."
I just wish people would spend more time becoming good programmers than chasing the new shiny things.
I'm not sure the argument about rapid prototyping is a good one. Maybe it's just you can get it done in a weekend because you enjoy playing with new shiny thing, so are more productive. The old-established-stuff has a multitude of libraries that have been war tested to ultimate reliability though...
I'm just getting old. Ignore me. Every generation has to reinvent the wheel for some reason anyway :)
I thought this thread might benefit from my response as a self-admitted shiny-thing-chaser.
Your Prada coat analogy is fairly silly, as I'm sure you are aware. First of all, when it comes to programming languages/frameworks, both the new-hawtness Prada and the "old fashioned" raincoat tend to be free, so there is no financial rationality argument to be made. Even if you (correctly) account time-to-learn as a type of cost, that cost is roughly equivalent to the time it took to learn the alternative you already know. I think a better analogy would be an old raincoat versus a new, similarly priced raincoat. Perhaps the new raincoat doesn't actually cover you properly from the rain though; unfortunately programming languages and tools are a lot more difficult to analyze than raincoats.
Do we agree, at least, that some chasing of shiny things has been very beneficial to our field? That our current crop of languages are better raincoats than architecture specific assembly language? That assembly language was itself a better raincoat than machine language? If you will stipulate this, then is it your position that we have reached the pinnacle? That no more advances in productivity or correctness can be made? If this is your belief, then OK, that's fine, you are entitled to it, but I do not believe such a pinnacle even exists, much less that we have made it there. And if we indeed have not reached the pinnacle, then we need people who are early adopters, who are willing to try out and write for technologies that are immature and untested. Maybe most or all of them fail to be even as good as what we already have, and the whole thrust turns out to be fruitless, but with no thrust at all, success is not even a possibility. Everything old was once new, and all the old-fashioned technology once had early adopters being made fun of for their impracticality.
Plus, learning new things is a great way to improve as a programmer and it's a lot more fun; all that other stuff is just incidental.
First, I want to say that irrespective of its flaws, the JVM is an awesome piece of software. There's lot of really exciting work still going on: G1 collector, Azul's open source contributions, register based Dalvik VM. Ten years ago few people would have imagined distributed databases running on the JVM that are fast enough to saturate the network, industrial scale JVM based HPC and running 20gb sized heaps under heavy production load without any GC-related problems in sight. On another note, few people would have imagined Scala and Clojure either. So don't take what I am about to say as a "yet another rant from a developer bashing his tools" rant.
There's great deal of optimization going into the JVM as a server-side VM, but unfortunately, there's less work going into the JVM as a client VM. Lot of the advanced options that exist on the server version of HotSpot aren't available on the client version. Azul's recently open sourced tweaks are also aimed at the server side market.
I fully understand the reason for this: Microsoft dominates the "thick client" market, the money for Java is on the server side (with Javascript client code running in the browser). The few Java desktop apps that I use are development related: Yourkit profiler, JVisualVM, IntelliJ, Processing-based Arduino IDE and Eclipse CDT (for the excellent AVR plugin). They're actually excellent, but (with the exception of light-weight Processing/Arduino IDEs) they're aimed at developers who are familiar with the Java platform and don't mind occasional rough edges: I had to tune IntelliJ's GC settings (enabling CompressedOops on 64-bit Linux, using a 32-bit JVM and CMS collector on OS X, adjusting sizes of various generations etc...) to get it to stop locking up on me. Given I work on memory intensive Java and Scala applications, that's not a problem for me. Imagine, however, a word processor application that required this ("hey mom, it really is called `CompressedOops', one word, capital C and capital O!").
On the other hand Qt provides many utilities for C++ beyond the UI and has bindings for Ruby. It's cross platform between OS X, Linux and Windows. Qt and C++ would be the route I'd take were I to build a cross platform application I am skilled enough with valgrind and honestly don't find lack of a GC to be a major problem for me (especially for desktop programming, which doesn't frequently involve complex parallel algorithms that are difficult to implement with manual memory management). Qt's provides a great deal of what one expects from a high-level platform like the JVM: concurrency libraries beyond primitives, simplifications for building event/callback driven applications, additional collections and even an IoC container.
If I didn't mind tying myself to Microsoft's standards, I'd also take a serious look at Mono: while Mono's GC is primitive compared to HotSpot when it comes to server side applications, I've yet to hear of people having to tune it in order to run Mono-based desktop apps such as (excellent) F-Spot; garbage collection is a long-ago solved problem, I find it hard to believe that client VMs couldn't come with sane defaults out of the box. In this way it reminds me of Linux in mid-90s/early 00s (before distributions like Ubuntu and extensive driver support): lot of work going into scaling to thousands of processors and new architectures, while getting support for desktop sound and video cards required recompiling a kernel.
Finally, the author really strikes a point with lack of a POSIX API built in to the JVM. What's even more striking is that this can't even be explained by "focus on enterprise server market" theme. Server-side, Linux is almost a mono-culture with occasional use of Solaris or OS X (the latter frequently as development environment). The few non-POSIX platforms (Windows, mainframes) aren't really a large part of the Java market and have POSIX compatibility layers available for them (bonus point: why not support both, like Perl does -- having builtins for most system calls, but also an excellent Win32 module).
POSIX-JNA is available and (from what I hear) is an excellent library, but its use of GPL makes it incompatible with Apache-licensed projects (ASL 2.0 being de-facto standard in the Java world). A minimal interface to POSIX, coupled with a "systemcall()" method (allowing easy use of Linux-specific extensions) should be the standard part of the JDK: Python and Perl offer this without sacrificing portability and safety, why can't Java?
I'm not sure this is what you want to hear, but you might as well give up on Java on the client. No one cares anymore.
The only big company shipping client-side Java software applications is IBM, with Eclipse and their other Eclipse-based products.
Oracle quite plainly doesn't care, and without significant investment it's just not worth the pain anymore.
I'm a Java developer, and I've recently been looking into developing a cross platform client app. Java just isn't a serious player - the reasons why I'm looking a client app (basically hardware access) are exactly the areas Java is spectacularly weak in.
(I'm excluding Android from this, because that is a special case)
Nobody (generally speaking on the grand scale) is using LISP for much of anything. Same is true with Ruby, you'll be hard pressed to find a "serious" or "major" company actually shipping Ruby. You look in those terms and you have 2 options: C++ on Windows and Objective-C on Apple, those are the only interesting client app platforms that the "major players" are doing anything with. Maybe you throw .NET in to the mix but that looks a lot more like Windows-only 'java' for enterprise server apps than a client platform.
I could see the concern about major player adoption being an issue of Java going away; that doesn't seem likely anywhere except maybe on Apple, even there I doubt it will happen until Oracle or another third party "takes over" the Java on OSX platform, I couldn't see it becoming completely unavailable. Now Java isn't great at doing the fancy platform integration, the Mac menu and windows tray support seem a bit fragile and the interest has been low enough that it doesn't have a ribbon menu and various modern animated UI components can be a lot of work in Java. Depending upon what you need the client app to do though it still seems like a totally viable platform. If it does what you need, what does any of the other stuff matter?
Because the future of the platform is important if you are betting an application on it.
Now Java isn't great at doing the fancy platform integration, the Mac menu and windows tray support seem a bit fragile and the interest has been low enough that it doesn't have a ribbon menu and various modern animated UI components can be a lot of work in Java. Depending upon what you need the client app to do though it still seems like a totally viable platform. If it does what you need, what does any of the other stuff matter?
It doesn't do what I need, and it's direction as a platform indicates that there is no point working on the things I need myself because then I'd have to support them for ever.
For most of the major scripting languages (Ruby, Python etc) there are pretty decent libraries for platform integration, and they are being actively used by multiple developers. If I need to patch something then there is every chance that patch will be accepted back into a broadly used library.
With Java, the ecosystem of client side developers isn't very large, and there aren't any big companies putting resources into it either. It's just stagnant, and that isn't good enough.
The Eclipse platform is great for making large, complex GUI apps, and IBM aren't the only ones using it by any means. Startup time is slow (for starting development as well as launching the app) but you can develop an app with a very responsive interface (which is apparently impossible in Swing.) Eclipse RCP is definitely a good batteries-included application platform for heavyweight GUI apps. I wouldn't use it for something small and light, though.
I entirely agree with you re: "but you might as well give up". If I had to build a cross platform desktop app, I would use C++ and Qt, which seems like the most sane option.
The most sane option, really? You'll use a runtime without garbage collection. And a GUI framework, vastly inferior to say WPF ( GUI accelerated, XAML DSL, tools for designers ).
I said "sane" not elegant or beautiful or "best in every occasion". I'm also speaking in terms of cross platform. .NET is an excellent platform for Windows development and I won't dispute that. I am very well aware of Mono, but it still involves tying myself to a Microsoft standard (plus it's always going to be behind in terms of both language and libraries).
As I've mentioned earlier, I am fine without garbage collection for desktop applications (implementing complex parallel algorithms is a different matter -- there GC very clearly helps). Sure GC is very nice to have even when you can make do with manually finding/fixing leaks (I'd rather focus on something else), there are many other advantages to high-level languages than just garbage collection (actual RTTI/reflection -- not very possible in C++, richer built in collections, callbacks without function pointers or "operator ()"; in the case of Scala, F#, Lisps -- and to a lesser degree C# -- there is functional programming, closures, DSLs, etc...)
My current employer maintains and develops several Eclipse RCP apps. Google Earth, last I checked, is Eclipse-based. It's not that unheard-of, even these days.
It seems like "tracing JIT" is really just "advanced compiler hints from runtime analysis". There's no reason this couldn't be gathered by tools and saved in a file to be used by compilers. Combine this with very advanced symbolic debugging and very rapid compilation, there would no longer be any need for late-binding dynamic languages. (Says the lover of Smalltalk, no less! It would have to be with debugging on the level of Smalltalk or IPython and then some.)
And true to form, on the JRuby project we're looking into ways to optimize around where the JVM doesn't quite serve us well. I've recently been experimenting with doing my own dynamic optimization passes, and that let me to think about how to save off this dynopt information to disk for instant gratification on future runs.
I think part of the problem with the JVM is that its requirements have too long been driven by the big EE server folks, who have almost completely different requirements from day-to-day developers, client app developers, RIA developers, and so on. The key here might be making a good business case for those other domains, to help drive the JVM in the direction those domains need. I actually have hope that the (perhaps misguided) push for JavaFX at Sun/Oracle will bear fruit in the form of client-side and non-giant-server-app domains, since I know the JavaFX team have butted their heads up against a lot of the same problems we've faced in JRuby.
When he says "object serialization" is broken, does he mean Java's built-in object serialization (JOS), or the concept of serialization, independent of any specific implementation? He later says "default serialization", which is what JOS's serialization of an object is called if its serialization hasn't been customized - but maybe he means JOS itself?
Anyway: for JOS, you don't need to provide no-arg constructors and you don't need to un-final fields, because JOS extralinguistically both creates without constructors and sets final fields. JOS also provides hooks for you to initialize objects. It has many other other hooks that few people use.
It's true that some aspects of JOS are ugly in how it implements what it does, but much of what it sets out to do is necessary for a full-featured serialization, that can work over the network. Serialization in other languages hasn't shown a "right" way to do it that I'm aware of.
I'm interested to hear more about the problems Charles found with JOS, and whether he has misunderstandings about this arcane bit of java, or if it's me who's misunderstood his very brief aside on it here.
I mean the built-in serialization. Outside of the classloading and security hacks required to make it work, the fact is that it performs at its worst if you don't do things like provide a no-arg constructor and non-final fields. In those cases, the amount of reflective hackery required behind the scenes is absolutely dreadful, and in benchmarking a simple graph of objects recently I saw that 99% of the time was spent doing reflective access. That's absurd.
Rewriting to use Externalizable was a painful process, but it was orders of magnitude faster than builtin serialization. The default serialization mechanism is basically unusably slow for any high-throughput purposes.
I have not seen the hooks you describe for user-driven initialization of classes, and unfortunately most of the resources I consulted online while trying to write fast deserialization logic recently didn't mention them either. Got a link? I'm certainly willing to learn what I'm doing wrong.
True, reflection is slower than regular access - though JOS does a fair bit of caching to avoid some of the cost (for when you serialize many instances of the same class). I think your experience with rewriting as Externalization shows that JOS is one of those performance vs. ease trade-offs.
BTW: Are you sure that no-arg constructors and un-finaling fields makes a significant performance difference? The only instance I've come across for this showed the (surprising) result that JOS's setting final fields is actually faster than reflective setting of non-final fields - but I haven't profiled that explicitly. I would expect deserialization with a no-arg to be slower, because then it has to call the constructor in addition to actually creating the object (allocating memory etc). But I haven't profiled this either.
You can customize the initialization of a class by creating a readObject() method for it, which is a sort of extra-linguistic constructor. It is confusingly named the same as the method that you call to start deserialization - but here it is a callback method that you write, that JOS itself will call:
private void readObject(ObjectInputStream ois)
throws IOException, ClassNotFoundException {
ois.defaultReadObject(); // "default" deserialization
// ... your initialization here ...
}
You can also read data from the stream explicitly, and set the class fields yourself - if so, you need to write a corresponding writeObject() method that writes the data to the stream. By avoiding reflection, this should be faster than the "default" (but again I haven't profiled this). If you don't provide a readObject, it will call defaultReadObject() for you by, you know, default. Same for writeObject().
There's also hooks for validating the object graph after all of it has been deserialized; and for replacing (resolving) one object with another.
I know something of JOS, but not much of its performance, which seems to be your crucial requirement. So I might not be much help, but let me know if you'd like more info or if I've misunderstood something (I might not reply til tomorrow - it's 3am here).
Ahh, I see where the confusion comes from. I meant initializing in the Ruby way...which is analogous to construction. There's no way for you to specify how to construct the object being deserialized from the stream, and so you need a no-arg version of the constructor (or can allow JOS to generate one for you) and have to do the logic that would be in-constructor in a separate piece of code like readObject. And in my latest experiments with Externalizable, the JOS code being unable to construct objects the way I want it to has become the latest bottleneck.
It also interferes with us serializing objects for which we do want to have final fields initialized on construction. If we want to avoid the reflective construction, we need a no-arg constructor. To have a no-arg constructor, we can't make important fields final. And if we want a particular value to be passed into the constructor, we always need to do it in readObject, without any context provided as to where and when serialization is being called. That's so cumbersome that we simply can't do it.
To check I understand: you need to deserialize objects, and you also want to pass in some arguments to some of those objects, so that their initialization is based on both the serialized data and the arguments? Is that to do with currying (is currying in Ruby?) In a complex object graph, how do you know which arguments should go to which objects?
You could use a global object to hold these arguments (or, subclass Thread, and associate the data with that, and yield while the deserialization runs in that thread). Ugly, yes.
BTW: it is possible to set final fields outside a constructor, using some black hacking (by accessing the same hooks that JOS uses internally). A few serialization tools use this, and I recall a project that consisted entirely of providing nicer access to these hooks (I searched for it, but couldn't find it).
BTW: There's a super-fast approach to serialization involving pointer twiddling: you save the raw memory, and when you load it back, just adjust the pointers for its new location (heard from Andy Hunt).
This doesn't help you if you want to serialize in order to inspect the serialized form; and it throws away Java's lispy symbolic bindings; and it would have to be implemented within the JVM - but it is fast.
That's exactly the sort of thing I was hoping to see out of the JVM's default serialization. As it stands I think I'm going to basically have to define a custom marshaling format and hand-serialize everything for the domain I'm interested in (which, btw, is serializing pre-parsed Ruby code to disk so it can be loaded later without parse time...done to avoid the slowness of our parser during the cold first seconds of the JVM!).
Writing a de/serializer is pretty straightforward, the only slightly tricky bit is handling graphs (you record each object, associated with a label; when you encounter it again, you serialize a ref symbol to its label). It sounds like a whole project, but it's only for your specific use, it wouldn't be that much work, and definitely be worth it.
But hand-serializing everything is a serious grind. Would you use a code generator (or use BCEL or similar) for the default case? I bet the vast majority of cases are automatable, and for the other cases it's a starting point.
For cold start, I personally like the idea of reusing a JVM. It would be nice if it just had a reset or clear button. :-)
BTW: Even though serialization is just a tiny part of your project, you are right on top of it - I'm a little awed.
Thanks for this discussion, you helped inspire me to write my first Ruby program tonight (a regular expression engine) - it felt much lighter-weight than Java (but it's a bit difficult to judge, because it's a problem I'd previously worked out how to implement in Java).
Startup time is a very real problem, as is memory use compared to other, similar VMs (V8, LLVM).
With closures (lambda expressions), Java will be a lot more useful, but my money is honestly on V8 and JavaScript at this point. It's moving faster, and the node.js guys are very right that JavaScript is the language of the web.