As an avid reader of HN I'm surprised it's the first time I hear of many of these many (LLVM backed) projects. Some that stood out to me:
- The VMKit project is an implementation of a Java Virtual Machine (Java VM or JVM) that uses LLVM for static and just-in-time compilation.
Am I reading this right? A faster Java that doesn't require a virtual machine? Does this mean faster Java, Scala, Clojure, or even Ruby, and deployed anywhere that LLVM can build for (which means pretty much everywhere)? Sounds too good to be true.
- LanguageKit is a framework for implementing dynamic languages sharing an object model with Objective-C.
- Eero is a fully header-and-binary-compatible dialect of Objective-C 2.0
Has anyone tried Eero? It looks interesting (and seems to be compatible with Objective-C code): http://eerolanguage.org/from-objective-c-to-eero. One thing I didn't get from the documentation is does Eero still require header files?
Emscripten is an amazing project. I cannot express the amount of fun I've been having the last few months.
A few things I've been working on:
A N64 emulator that already runs a few demos quite fast. The Super Mario 64 rom loads, but it doesn't work yet. I suspect it to be a TLB issue.
A dosbox port, which I'll later try to use to run Windows 95 on top. I know, this is a monumental attack on Microsoft's copyright. I won't release this.
A port of the dillo browser. A browser inside another browser. I've looked into webkit but I'm sure it is too big to fit in a browser.
A port of PHP's cli interpreter.
If all of this is not fun, then I don't know what fun is!
All thanks to LLVM and emscripten.
Precompilation: by compiling ahead of time a small subset of Java's core library, the startup performance have been highly optimized to the point that running a 'Hello World' program takes less than 30 milliseconds.
That startup time is probably the biggest turnoff for people seeking to write lightweight scripts in JVM languages. If your script's runtime is 100ms after a 5 second startup, it doesn't particularly matter if the machine code version runs at half speed.
but it is precompilation of the core library, not the whole app, so presumably it could be done only once for the system and cached somewhere. (maybe like .Nets global assembly cache, which caches jitted dlls)
Right, a naive implementation of anything going up a finely tuned one will generally behave accordingly even if the naive one is using fundamentally better CS :). I would naturally assume that J3 on VMKit is more a proof of concept of VMKit, at least until a lot of people start hacking on it. The other use of LLVM in OpenJDK Shark is about portability rather than pure speed.
The Sun JVM is pretty well tuned. A lot of people have a lot of money riding on top of it performing well. There is room for improvement (see Azul) but I would be surprised if a FOSS project would break much ground without major commercial backing.
Static compilation would have a lot of benefits for short lived programs. I know IBM's i OS (OS/400) used to use statically compiled Java and was quite a bit faster than that era JRE, but it's somewhat cloudy as the whole machine was quasi-VM with the MI level. Not sure if they retained this but it would be interesting to compare now days to i.e. HotSpot in Java 7.
>> - The VMKit project is an implementation of a Java Virtual Machine (Java VM or JVM) that uses LLVM for static and just-in-time compilation.
>> Am I reading this right? A faster Java that doesn't require a virtual machine? Does this mean faster Java, Scala, Clojure, or even Ruby, and deployed anywhere that LLVM can build for (which means pretty much everywhere)? Sounds too good to be true.
You are reading it wrong. Every word in the sentence "A faster Java that doesn't require a virtual machine" is somewhat incorrect or inaccurate. Let me explain.
First of all, there are no guarantees that VMKit -powered Java is faster than any other Java implementation out there. The LLVM native code generators are powerful, but they were not initially designed for JIT compilation, so the compilation is said to be a little slower than many other JIT's. I'm not saying that VMKit will be slower, either. It's probably a case by case situation, where one JVM is faster in one case and another one in a different case.
Second, VMKit is a framework for building virtual machines, so the part about "without a virtual machine" doesn't make sense. A VMKit based virtual machine will interpret and JIT compile the Java bytecode pretty much like any other VM out there. In contrast, the GNU Java Compiler (gcj) compiles Java code into native code and works without a bytecode interpreter. I think GCJ is often used to pre-build the Java class libraries into machine code in some setups.
>> Does this mean faster Java, Scala, Clojure, or even Ruby, and deployed anywhere that LLVM can build for (which means pretty much everywhere)?
No, it does not. As said before, the virtual machine is there as usual and the virtual machine has to be ported to the target platform. This may prove to be problematic on e.g. Android, because it ships with a limited userland and you need for example full C++ standard libraries to get LLVM working, which you need for VMKit. (This does not apply to non-JIT'ed language, where LLVM is used ahead of time and not on the target arch).
The real issue of portability across platform is not the target architecture's instruction set architecture (ISA) and it never was (despite what Java marketing wanted you to believe in 1999). With a modern compiler like LLVM it is trivial to produce machine code to multiple architectures like x86, OpenRISC, ARM, MIPS or an AVR microcontroller.
The real issue of portability is the frameworks and libraries needed. It starts from the operating system and standard libraries. Then comes all the libs that are built on top of those.
So even if it were possible to get e.g. Ruby up and running on a previously unsupported platform, the problem would be that Ruby on Rails would probably not work as easily.
I hope this clarifies a few of the misconceptions you had about LLVM and projects that use it. Even though it does not quite fulfill your expectations, remember that LLVM kicks ass.
> A faster Java that doesn't require a virtual machine?
I don't know about 'faster', but gcj has been compiling Java to machine code for a while now.
From the homepage:
> GCJ is a portable, optimizing, ahead-of-time compiler for the Java Programming Language. It can compile Java source code to Java bytecode (class files) or directly to native machine code, and Java bytecode to native machine code.
Yes, it's also ridiculously difficult to get working. I'm not entirely convinced it's possible to get it to work anymore without some serious hacking; and OS X doesn't include it in their dev package.
See, in Ubuntu, I do 'sudo apt-get install gcj', hit enter twice (to bring in all the dependencies), and a little while later I have gcj up and running on my system.
I don't doubt it needs hacking to get running; I probably wouldn't want to install the latest release from source. However, when you have the package maintainers of two distros (Debian and Ubuntu) on your side, things get a bit easier.
Off topic suggestion: When you have a home page for a project, if the name is an acronym, you should spell out the acronym in the first paragraph, if not the first sentence or the title.
From http://llvm.org/ I looked at Overview, Features, Documentation and FAQ, and did not find the definition of LLVM. I ultimately had to go to Wikipedia.
edit: LLVM once stood for "Low level virtual machine", but that is no longer the meaning. The early name comes from Chris Lattner's research paper describing an "ideal machine language", an intermediate language for compilers which is a little like an Assembly -type language for a virtual machine with infinite registers.
The first sentence on the front page of llvm.org pretty much sums it up: "The LLVM Project is a collection of modular and reusable compiler and toolchain technologies."
It may not be the clearest LLVM description out there, but that's pretty much what it is. If the description had more detail, it would not fit in one sentence.
The hard thing about describing LLVM is that it's a huge complicated project in a domain that's outside even many professional programmers' domain.
I tend to say that LLVM is (to me) a "compiler infrastructure", because I use it to build compiler back ends. However, LLVM is so much more than that, as the project includes loosely coupled tools ranging from complete compilers (clang) to debuggers (lldb) to byte code and binary format introspection utilities (llvm's binutils counterparts). So a "compiler infrastructure" or any other dumbed down explanation wouldn't do it justice. That's why the first sentence on the front page is actually pretty good.
LLVM actually still is a virtual machine, since it contains mechanisms for executing code written in LLVM IR. There are two execution paths - an interpreter and a JIT compiler. I guess this makes it a VM after all, although the acronym is no longer descriptive because the VM part is a tiny fraction of what LLVM includes and can do. As you said, its super-tool for creating compilers, especially compiler back-ends.
The LLVM interpreter is less than ideal as an actual interpreter rather than something for doing constant folding and that kind of thing; the instructions are too low-level for it to be really fast. For example, bit widths of integers are represented in the abstraction; that adds overhead to even the simplest of arithmetic operations.
I don't think the interpreter was intended to be really fast. It's pretty good for debugging and platform-independent execution. For speed there's the JIT.
"if the name is an acronym, you should spell out the acronym"
(I did not downvote you BTW)
Someone else replied that the project is now just referred to as LLVM. That's fine, but people expect acronyms to stand for something, and the definition or lack of should be way at the top of any project. Lots of people come to a project for the first time and aren't in the know.
Sorry, missed the acronym part. Anyway, spelling out the acronym only creates more confusion.
I don't understand the downvotes, if you downvoted, please tell why. I tried to be sensible in explaining what LLVM is and why it's hard to explain. And why it's no longer an acronym.
pedantic: LLVM is not an acronym, it is an initialism. An acronym is when the abbreviated letters spell something pronounceable as a word (GNU, NATO, SCUBA, PATRIOT act...) an initialism is just an abbreviation using the first letter of every word in the abbreviated name or phrase.
The problem with "Low-Level Virtual Machine" is that it gives a very wrong view of what LLVM actually is (because a "virtual machine" is associated with JVM and similar tools). That full name is not even used a whole lot any more, people refer to LLVM as LLVM which means the umbrella project under which a lot of subprojects exist.
Oh no you didn't. If you did, you should have at least made the change yourself (IIRC the LLVM website is in their source repositories in SVN or Git) and posted it as a part of your bug report.
The LLVM developers are experts in compilers, I'd much more prefer them spending time writing compilers than fixing little things on the web site.
- The VMKit project is an implementation of a Java Virtual Machine (Java VM or JVM) that uses LLVM for static and just-in-time compilation.
Am I reading this right? A faster Java that doesn't require a virtual machine? Does this mean faster Java, Scala, Clojure, or even Ruby, and deployed anywhere that LLVM can build for (which means pretty much everywhere)? Sounds too good to be true.
- LanguageKit is a framework for implementing dynamic languages sharing an object model with Objective-C.
- Eero is a fully header-and-binary-compatible dialect of Objective-C 2.0
Has anyone tried Eero? It looks interesting (and seems to be compatible with Objective-C code): http://eerolanguage.org/from-objective-c-to-eero. One thing I didn't get from the documentation is does Eero still require header files?