One of the things in which Octave is much slower than Matlab is in looping. Matlab used to have this slowness too until they started JIT compiling their loops. For Octave, this has been more difficult, and the only tool that we've had for accomplishing this has been LLVM.
While our JIT compiler code is still very much alpha, it has already been quite a pain to deal with the LLVM JIT "API" because the truth is, they don't really have an API, i.e. no promise of stability. Every LLVM release has broken everything and our code keeps having to change in order to accomodate all of those changes. Every. Damn. LLVM. Release.
I don't know how stable the JIT API for gcc will be, but already it's starting to look much better and thought-out as a public API. Here's hoping that our fellow GNUs-in-arms can help us make a faster and better Octave!
This is pretty cool but even when it is more robust and works with optimized code I think most developers looking for something like this are far more likely to choose LLVM for licensing reasons.
Current gcc is GPLv3 and that is very unlikely to change, if you link your project to this you are thoroughly "infected" by the GPL since unlike many GNU projects that are explicitly meant to be linked to by your own projects, gcc's core is not LGPL and the runtime library exemption granted by gcc is not sufficient to save you from infection in this use case.
Unless you are extending the compiler why would you care? GCC consistently produced faster code that LLVM
If you are extending the compiler, why would you want to allow companies like apple and oracle to bottle up you hard work and not give anything back? Either by not giving the code back or by patenting parts of the functionality they add and then suing you when you try and use them.
And? The above post asks why companies need to make proprietary code changes to the compiler, or why the community should allow distributors of GCC to patent parts of the compiler and then go suing users who the distributer gave copies to.
Even if you don't make changes to the compiler, if you link to this project you are GPLv3 infected. You simply can't use this without exposing your entire project to GPLv3 infection, the usual LGPL and/or runtime library exceptions don't apply to gcc's core code because it was never intended to be used as a library.
Let's say you have an existing FOSS app that is BSD or MIT licensed and it has its own built-in scripting language. You'd like to build a JIT for that scripting language, you see this library and decide to use it in your project, well... you can't do that without changing your entire project to GPLv3 terms because the combined work created between this and your own code all has to be available under the GPLv3 terms. This is usually solved by making the relevant parts of the project LGPL or granting a runtime library exception but neither of those applies to the core gcc code, so the GPLv3 infection in unavoidable.
Whether or not that issue is important is open to debate and depends upon your software politics, but in practical terms it means very few people will use this in their project unless they are already GPLv3 committed for some reason.
I work with David. Of course he knows about LLVM. This is his way to bring gcc into the modern era. Also related is his plan for removing global state from gcc:
" This is his way to bring gcc into the modern era"
and
"He's seriously invested in cleaning up gcc and making it a competitive compiler."
Optimization wise, GCC is still much more modern than LLVM, and in fact, more modern than a most commercial compilers.
Architecture wise, there are warts, but bringing GCC into the modern era architecturally was never a technical/engineering challenge.
To be honest, speaking as a guy who wrote plenty of GCC's current optimizations, he'd be a lot better off fixing LLVM's JIT issues than trying to rearchitect GCC.
At this point, it's hard to come up with good reasons to continue work on GCC past "fun".
I work on a runtime compiler with LLVM and gcc backends. LLVM presents me with a nice API to generate its IR in memory. For gcc, I actually write out a ".c" file and then invoke gcc to compile it into a shared library, which I load with dlopen. It works, but it's messy and I would prefer to work with an API like LLVM's. This project is exactly what I've been looking for.
LLVM doesn't make a great JIT compiler and I don't think gcc will either. The problem is that, most of the time, your JITed code won't be used very much and so the amount of time that it takes to generate it is a significant proportion of the total execution time.
LLVM and gcc are optimized for producing the fastest possible code and will do so at the expense of spending more time doing it. That's not to say that they can't become great JIT compilers in the future, but I don't think we're there yet.
LLVM understands this issue. That is why they have "regular" instruction selection passes and "fast" instruction selection passes. The JIT uses FastISel. But you are definitely right, it is not anywhere near as lightweight compared to, say, Hotspot.
There is also an issue of FastISel not supporting the full LLVM IR, so if you depend on FastISel for fast JIT, you basically need to design LLVM IR generation around that, limiting yourself to undocumented, unspecified, and ever changing subset of LLVM IR. Which can be done, but not very pleasant.
So I guess I disagree.
There are known good solutions to this problem.
Yes, it's definitely some work actually writing the code right, but it's not like this is a problem that requires engineering brand new solutions.
It just requires a good engineer and some time.
I consider that "simple", as on the scale of "engineering complexity", it would be simple, even though on the scale of "engineering time" it may take longer.
There are many optimizations in GCC and LLVM. If you turn them all off you will compile fast and execute slow, and if you turn them all on you will compile slow and execute fast. You can do this on a per function / trace / translation unit / whatever basis.
Production JIT compilers are the same way. For the hottest code paths, all the optimizations get turned on. The coldest ones are interpreted. The first level of jitted compilation has very few optimizations enabled.
The main thing that doesn't cooperate with JIT compilation yet is whole program analysis.
That's not actually true, at least with regards to LLVM. The vast majority of the time in LLVM is spent in instruction selection. LLVM spends a lot of time on instruction selection/legalization, which you can't just turn off.
But you can't turn off codegen in a JIT compiler either, unless you're interpreting code, so that requirement doesn't fundamentally make GCC and LLVM impractical. It sounds like LLVM either doesn't have very many optimizations or their backend needs work.
LLVM's instruction selection/legalization infrastructure is very sophisticated, very generic, table-driven, etc. Most JIT compilers use more ad-hoc and quicker mechanisms to get machine code out of IR.
People at Apple working on JavaScriptCore (Safari's JS engine) have recently added a fourth-tier JIT using LLVM, so very much run for only very-very-very-hot code (of course, by waiting for it to become very-very-very-hot, you've already lost a time you could've gained). I can see LLVM making its way into more JITing compilers in such ways: as an ultimate solution where it is clear it's worthwhile spending a lot of time compiling code.
The node.js server case seems worthwhile to bring up here: node.js servers are typically run for days or weeks on end. An extra few milliseconds (or even a second or two) isn't significant if it makes a noticeable difference in performance.
A good example of where this is important is in a tracing JIT. Do you want to compile after 1000 iterations or 10,000 iterations? Faster JIT compilation means it can kick in much earlier.
AFAIK, you can manually specify all of the optimizations done by LLVM during JIT compilation. I think it's not hard to disable most of them for fast compilation, but yes, it still may be too heavy for an instant use.
The Unladen Swallow project (faster Python, including an LLVM based JIT) found that LLVM wasn't really designed as a universal JIT compiler, or at least not one that was useful for languages such as Python. That was one of the reasons why they eventually stopped work on the project (although a lot of the non-LLVM work they did was still very good an useful and so the project as a whole was not wasted).
The Pypy developers (another Python with JIT project) looked very, very, carefully at every available JIT out there, and finally ended up writing their own. It was the only way that they could get something that would do what they needed. The results of doing that was performance that was far superior to what the Unladen Swallow project got when using LLVM.
It sounds very simple when you look at it from a distance. "We need a JIT, here's a JIT, let's use it." Then you find out that just because something is called a "JIT" doesn't mean that it will be of any use to what you are trying to do. The subject area is very complex and there will probably never be a universal solution.
What I could see the GCC JIT mode being useful for is things like generating certain critical portions of a program under programmer direction. That is, it would make a nice library that you call to generate very optimized code for specific functions or modules. A good example is how GCC is currently called in the background by a number of Python libraries which dynamically generate C code and compile it for faster execution. Being able to do this more directly via a JIT process could be very convenient. This is perhaps the sort of application that the author has in mind.
One of the things in which Octave is much slower than Matlab is in looping. Matlab used to have this slowness too until they started JIT compiling their loops. For Octave, this has been more difficult, and the only tool that we've had for accomplishing this has been LLVM.
While our JIT compiler code is still very much alpha, it has already been quite a pain to deal with the LLVM JIT "API" because the truth is, they don't really have an API, i.e. no promise of stability. Every LLVM release has broken everything and our code keeps having to change in order to accomodate all of those changes. Every. Damn. LLVM. Release.
I don't know how stable the JIT API for gcc will be, but already it's starting to look much better and thought-out as a public API. Here's hoping that our fellow GNUs-in-arms can help us make a faster and better Octave!