Graal Autovectorization

truth_seeker · on Sept 25, 2019

Very good move indeed. This is what a smart JIT should be able to do so that the programmer does not have to worry about various hardware-specific optimizations.

Now since they are able to loop unroll it by analysing AST and converting into SIMD, it must also be possible to forward instructions to GPU/GPGPU in the future with little more effort.

Similar efforts on JVM in the past - https://astojanov.github.io/blog/2017/12/20/scala-simd.html

LLVM - https://llvm.org/docs/Vectorizers.html

GCC - https://www.gnu.org/software/gcc/projects/tree-ssa/vectoriza...

chrisseaton · on Sept 25, 2019

Note that of course the JVM's top-tier JIT compilers (C2, Graal EE) have already done this autovectorisation for many years - this is just the first time code has been contributed for the open source version of Graal, because previously it was an Enterprise-only feature in that compiler.

For example open source code in C2 https://github.com/openjdk/jdk/blob/497cb3f4f4ae7f86c1731396....

> Now since they are able to loop unroll it by analysing AST

There's no AST analysis going on here - it operates at the graph level.

> it must also be possible to forward instructions to GPU/GPGPU in the future with little more effort

Hah. People have been trying general GPU offload for a decade and haven't really got anywhere yet. The blocking issue is finding work that is coarse-grained enough to make the context-switch worth it.

gmueckl · on Sept 25, 2019

Efficient CPU implementations and efficient GPU implementations that solve the same problem are typically completely different algorithms for anything that is more complex than a big loop. Programming 64 4-wide or 8-wide extremely powerful cores is very different from programming 400 limited cores, each 16 wide.

apta · on Sept 26, 2019

> this is just the first time code has been contributed for the open source version of Graal, because previously it was an Enterprise-only feature in that compiler.

Is the Enterprise-only implementation different from what has been contributed to the open source version?

repolfx · on Sept 26, 2019

Yes. This is Twitter's own implementation of the idea.

It may not get accepted. The maintainer of Graal asked Twitter some tough questions about whether they'd really be maintaining it over the long run and how much it really helped when they first proposed the patch and didn't seem to get any answers.

This seems like a continuation of the problem that has made Java a money hole for first Sun and then Oracle. Any attempt to stabilise its losses by making a better Java with proprietary extensions causes companies like Twitter or Red Hat to duplicate those features and submit competing implementations themselves, rather than simply pay the original team. Those duplicates then have to either both be maintained or one has to be rejected.

Given this sort of behaviour it's not clear that there's any sustainable way to fund core Java development, other than indefinite cross-subsidisation by other revenue lines.

apta · on Sept 26, 2019

> Given this sort of behaviour it's not clear that there's any sustainable way to fund core Java development, other than indefinite cross-subsidisation by other revenue lines.

Wouldn't it move to a model similar to that of C++? Perhaps have a committee that agree to what features get committed and maintained? There are obviously many large companies with deep pockets who heavily rely on the JVM for their core business.

kllrnohj · on Sept 25, 2019

> it must also be possible to forward instructions to GPU/GPGPU in the future with little more effort.

Highly unlikely. On most (all?) high-end systems the GPU or other compute accelerator has its own memory, and is connected to the CPU via a teeny tiny straw (PCI-E). The cost of shipping the data over and shipping the results back is astronomical. It's worth paying if you can ideally leave the data there or if the computation is so intense that the execution is just so much faster to pay for the overhead and then some. But a "smart JIT" is unlikely to figure that out.

If we were talking a homogeneous shared memory system, like most mobile devices, then maybe. But there's still non-trivial costs & setup associated with that.

pjmlp · on Sept 25, 2019

NVidia is already doing contributions for GPGPU programming with Graal.

https://github.com/NVIDIA/grcuda

This is why OpenCL got so little love, Khronos just kept focusing on C until it was too late.

kllrnohj · on Sept 25, 2019

> This is why OpenCL got so little love, Khronos just kept focusing on C until it was too late.

I think the major problem with OpenCL was that the primary author pushing it, Apple, had no presence in either the HPC nor high-end gaming industries (and still doesn't). Keep in mind OpenCL wasn't a Khronos creation, it was Apple's. Khronos adopted it in version 1.1, but the original 1.0 release was Apple-only as part of the Snow Leopard release.

As such, nobody that can drive GPU hardware sales was pushing it. Therefore OpenCL support from AMD & Nvidia was slow and bad, because it didn't help their bottom lines. It really wasn't because it was C instead of C++. It was because the drivers were bad and it was made "generic" too early on. There was no good programming guide for it, and it's super critical for GPGPU to know things like warp width and other programming information that OpenCL just doesn't have, because it's generic-ish. This generic-ish constraint is still a problem today, and is a major reason CUDA outperforms OpenCL so badly.

By contrast Nvidia drove CUDA hard, because it sold GPUs to the HPC crowd. And HPC crowds adopted it because it actually let them go faster with real programming guides on how to achieve performance from the GPU, along with actual documentation on what you should & shouldn't expect from actual hardware.

pjmlp · on Sept 25, 2019

Apparently what drove Apple away from OpenCL was that they didn't agree with how Khronos wanted to drive it further, note that Metal Compute uses C++14 as basis.

kllrnohj · on Sept 25, 2019

OpenCL also adopted C++. I don't think language choice really played any significant role.

pjmlp · on Sept 25, 2019

OpenCL only adopted C++ when it was clear that had lost to CUDA and some love was required, thus going over OpenCL 2.0, 2.1, initial introduction of SPIR and now SYSCL.

Incidentally most OpenCL drivers still aren't properly 2.0 compliant, and the best way is to get something like computecpp, the SYSCL implementation being done by CodePlay.

stochastic_monk · on Sept 25, 2019

NVIDIA's emphasis on modern C++ was very smart as well. Compare writing CUDA to OpenCL. I have hopes for ROC.

pjmlp · on Sept 25, 2019

Yes, pretty much that, although supporting Fortran, introducing PTX early on, and offering GPGPU debuggers were very clever moves as well.

imtringued · on Sept 25, 2019

That doesn't actually address the real problem. The constraints of GPU hardware and SIMD instructions limit the type of problems that they can solve. Not every algorithm can be executed on a GPU or via SIMD with good performance. If you want the autovectorizer to optimize your code then it has to obey these restrictions but the problem with autovectorizers is that they do not produce errors if they fail to vectorize. If you make even a tiny mistake in your code then your application will perform slower than expected. Without building a benchmark suite and a scalar version as a reference implementation it's going to be very difficult to keep track of whether vectorization happens as intended or not.

tom_mellior · on Sept 25, 2019

> Similar efforts on JVM in the past - https://astojanov.github.io/blog/2017/12/20/scala-simd.html

If I understand this correctly, this doesn't do any autovectorization. It introduces a clever way to allow programmers to use vector intrinsics without the JVM having to understand them. But the programmer does have to use those intrinsics manually, there is no automatic vectorization.

truth_seeker · on Sept 25, 2019

> It introduces a clever way to allow programmers to use vector intrinsics without the JVM having to understand them. But the programmer does have to use those intrinsics manually, there is no automatic vectorization.

Yep but that CLEVER WAY/LAYER can be very central to build higher-level API. So for a programmer point of view, who deals directly with higher-level (scala API as shown in example) dont have to bother about low-level intrinsics.

truth_seeker · on Sept 25, 2019

Also, have a look at the paper on LMS:

Explicit SIMD instructions into JVM using LMS - https://www.research-collection.ethz.ch/bitstream/handle/20....

lidHanteyk · on Sept 25, 2019

PyPy/RPython too: https://pypyvecopt.blogspot.com/2015/08/the-end-of-summer-py... https://morepypy.blogspot.com/2015/10/automatic-simd-vectori...

OskarS · on Sept 25, 2019

Does the CLR for .NET do auto-vectorization? I've always found it very snappy as VMs go. What about Mono?

truth_seeker · on Sept 25, 2019

I am not sure about recent enhancement if any but until last year it wasn't possible unless you use Vectors.

https://www.codeproject.com/Articles/1223361/Benchmarking-NE...

chrisseaton · on Sept 25, 2019

.NET has a much simpler kind of JIT than the JVM - it doesn't do any dynamic optimisations or auto-vectorisation.

pjmlp · on Sept 25, 2019

RyuJIT does auto-vectorization and .NET Core supports tiered compilation now.

On .NET we don't need to wait for the Vector API, it is already here.

chrisseaton · on Sept 25, 2019

> On .NET we don't need to wait for the Vector API, it is already here.

I think the point is people don't want an API - they want it to happen automatically for conventional code.

pjmlp · on Sept 25, 2019

Which even for C++ compilers isn't up to what intrisics are capable of.

merb · on Sept 25, 2019

I'm not sure if that is still true for .net core 3

Edit: Looks like it still is not done: https://github.com/dotnet/coreclr/issues/20486

moreati · on Sept 25, 2019

Does Graal being an Oracle project make anyone else nervous? I mean from a software license or patent perspective.

On the technical side there's much to like. However I worry that once Graal becomes popular, Oracle will announce something that makes it risky to use without paying for a per-seat license or support package. I'm probably just showing my bias, but they have form in this area.

Twirrim · on Sept 25, 2019

It's an open source project. If Oracle does something disliked by the community, the community can fork and do whatever they like. This is something that has happened repeatedly with open source projects when the community has disliked the government of it, e.g. think Jenkins/Hudson situation. Or going way back Mambo/Joomla.

Disclaimer: I work for Oracle, but nothing to do with Graal etc.

eslaught · on Sept 25, 2019

Looking at the license [1], it appears to be GPL v2 with a classpath exception. This means it doesn't directly deal with the issue of patents, only copyright. There have been cases in the past of companies going after users for patents on otherwise open source projects. The worst part is, Oracle reserves those right so they can decide retroactively to go after those users (unless Oracle decides to modify/replace the license).

[1]: https://github.com/oracle/graal/blob/master/LICENSE

repolfx · on Sept 26, 2019

Actually large parts are under the Universal Permissive License these days, which explicitly deals with patent grants.

e.g.

https://github.com/graalvm/graaljs/blob/master/LICENSE

https://opensource.org/licenses/UPL

I think Graal itself is GPL2+Classpath because it's derived from a codebase that was itself licensed that way, not because Oracle actually want ambiguity. If they wanted that, their new from-fresh codebases wouldn't be under a more precise open source license.

pmart123 · on Sept 25, 2019

Probably. Larry Ellison always seems to find a way.

hoschicz · on Sept 25, 2019

Can someone explain what is new here? Does OpenJDK have autovectorization or is it just getting added to GraalVM now?

StreamBright · on Sept 25, 2019

It is getting added to Graal now. Disabled by default. OpenJDK has it for some architecture, I think Intel contributed it for x86 and there is this for aarch64.

https://www.slideshare.net/mobile/linaroorg/auto-vectorizati...

Update: This is about the FOSS edition not the EE one. See comment below.

pgrulich · on Sept 25, 2019

So for Graal we have the situation that Graal EE (Enterprise Edition) has auto-vectorization. However, its not in the open-source version as Oracle holds it back. The implementation in this issue is however provided by Twitter.

pjmlp · on Sept 25, 2019

One should also note that if it wasn't for Oracle, MaximeVM would have been yet another cool dead technology out of Sun Research Labs.

So quite the opposite of holding it back.

amalter · on Sept 25, 2019

I think Graal and Graal EE on Oracle Cloud is one of the smartest product moves out of Oracle in .... idk ... forever?

However, it seems like a knifes edge to walk on. If Graal CE gets uptake, are there enough compiler folks at Redhat, Azul, Google, et. al. to shrink (or overtake) the Graal EE performance edge.

Graal CE must be “good enough” to get people hooked that they then want to hold their nose enough to engage with Oracle (through Cloud or license).

Maybe the management and visualization advantages are enough? I don’t think so though.

I also don’t think it will pay off (despite the incredible technical achievement that Graal is).

I was just talking with an ex-Oracle SMB sales rep, and they left because they would persuade businesses off SQL Server on technical merits, only to see their clients steamrolled by the Compliance Department a year later.

Larry is, 75 years old or so? I think recent Microsoft history can show goodwill can be created quickly, but it must be done from the top down.

pjmlp · on Sept 25, 2019

Commercial JDKs do pay off, so much that many of the commercial AOT compilers (since around 2000) are still in business, although with the ongoing support on OpenJDK that might change a bit (ExcelsiorJET just gave up).

JIT and GC algorithms to the level done by JVM implementations don't come up with all nighters and weekend programming scratching an itch, and those software engineers need to be payed accordingly.

So if others have a problem with Oracle, maybe they could compensate for the fact that Oracle employees still do 90% of Java development and OpenJDK related work.

amalter · on Sept 25, 2019

The niche JDK vendors are an order of magnitude off what Oracle needs to fund JDK development. I suppose the closest example is Azul, which is using the same "pay for performance" model of Graal EE.

I have absolutely nothing against some kind of commercial model for funding the JDK. My comments were that in my opinion, the model is unfortunately doomed:

- Lack of goodwill for Oracle - Enterprises who are not yet Oracle customers really really want to stay away from entering into a commercial agreement. True or not, the perception is that a license agreement with Oracle comes with aggressive and intrusive compliance audits.

- Worse is better syndrome - Indeed Oracle is the primary developer on the JDK, but the others entering this space are not hobbyists working on the side. Plenty of serious vendors with serious compiler chops have skin in keeping "free JRE" as the "fast enough" JRE. Redhat natch IBM, Google, Azul, Amazon, apparently Twitter (see pull request). Graal EE is supposedly 30-40% faster on some numeric workloads. But what if these players get that down to 20% or 10% .. or suddenly there might be workloads where CE is faster. Much harder to pitch that license agreement without compelling and unambiguous benefit.

I don't have a "problem" with Oracle, I'm just commenting on where I think the industry is right now. Maybe Oracle will prove me wrong - Microsoft sure did.

pjmlp · on Sept 25, 2019

The others entering on this space are mostly repacking Oracle's work, in what concerns Java language and JVM specification.

From those listed by you, IBM and Azul have their own JVM implementations, and just like Oracle require enterprise contracts for the cool features.

Finally, everyone complains about Oracle, yet no one else bothered to make a counter offer to buy Sun.

repolfx · on Sept 26, 2019

Doesn't matter though. Azul is probably doomed. Does anyone pay IBM for a commercially enhanced JVM? I never heard of it.

IBM might have some enterprise JVM, but they just bought Red Hat. Red Hat hired a bunch of former Sun/Oracle devs and then developed an open source pauseless GC, thus chopping the knees out from underneath Azul and Oracle's ZGC work.

What have Azul and IBM got now? They've gone down the path of trying to use LLVM as a JIT compiler, but they're now in competition with Graal and GraalVM+ZGC or Shenandoah would appear to match their capabilities. They had a good run with edge whilst it lasted, but ultimately there are only so many ways to make Java go faster and the world is apparently not short of companies willing to do JVM heavy lifting for free. But of course, only on the parts that other firms are trying to sell. I don't see Twitter implementing a Project Valhalla anytime soon.

Oracle have developed some great tech in GraalVM and are now trying to turn it into a real business. It's a remarkably long term strategy, but in the end there are lots of people who don't want to see Java go back to being a commercial product again and will happily 'burn' money to ensure it. And I'm sure some would love to just spite Oracle too.

I suspect eventually Oracle will let most of the Java and Graal developers go, probably reallocating them to a non-profit foundation that it slowly winds back commercial support for until its investment in Java is more evenly balanced with other large industry players. The existing OpenJDK people don't seem to be under any commercial pressure or urgency already so it wouldn't be a big shift for them.

pjmlp · on Sept 26, 2019

Ever heard of Websphere, IBM i, IBM z/OS?

repolfx · on Sept 26, 2019

Yes but I imagine a lot of developers haven't. How many new projects are being started on a mainframe?

pgrulich · on Sept 25, 2019

True! I'm very happy about the general open-source culture in the graal project. They are in general very open for ideas and always support! That is very cool!

StreamBright · on Sept 25, 2019

Sorry I am not aware of EE and its features, I should have added that my comment was about the FOSS edition.

boyter · on Sept 25, 2019

I had always believed the OpenJDK did based on this http://prestodb.rocks/code/simd/ which was on HN a while ago https://news.ycombinator.com/item?id=14636802 so yes?

This is really out of my depth though, would love someone who actually knows this area to pipe in.

xyzzy_plugh · on Sept 25, 2019

No, afaik this was not a default optimization. There are SIMD optimizations in specific paths but I'm not aware of general low level autovectorizations before this (outside of ORC/OIL which was never generally available for JVM afaik)

zjaffee · on Sept 25, 2019

SuperUser is enabled by default on openJDK, but it's incredibly simplistic vectorizations being done as can be seen in the article the person above posted.

1f60c · on Sept 25, 2019

I had no idea Graal VM was open source! That's awesome.