Invokedynamic in JRuby: Constant Lookup

a-priori · on Aug 10, 2011

I sort of understand how deoptimization works in general but I'm not sure how the JVM can use those methods to detect when a constant changes. Say you have code like the following:

  GRAVITY = 9.81
  def g_force(mass):
    GRAVITY * mass

At this point the JVM will aggressively optimize that method to inline the constant. If you later do:

  GRAVITY /= 6 # we're on the moon now!

This change will affect the values of any computations that depend on the return value of g_force but should not affect which code paths get executed down the line. I'm probably missing something about how this mechanism works, but how would deoptimizing be triggered here?

headius · on Aug 10, 2011

I don't know the granularity on which the deoptimization guards get inserted, but the basic idea would be that if the constant changes, any code running that depended on the value of that constant will have to deoptimize.

In your example, we would never enter the optimized code path without knowing that the constant's still the same as last time. Since no loop is emitted, there's no deoptimization mid-loop required. So we either run the optimized code, or we never get to the optimized code.

If a deopt is required mid-loop, Hotspot inserts a "trap" that would see the deopt command and immediately branch to the interpreter with the current state.

Of course there's limits to how far optimization can go, but having a way to tell Hotspot that there's a known-immutable constant object reference at this point in the code opens up a lot of opportunities.

equark · on Aug 10, 2011

Can somebody explain to me why the JRuby crowd think invokedynamic will lead to massive performance gains even though IronPython/IronRuby have had similar (to my understanding) capabilities for a long time but not seen massive gains?

headius · on Aug 10, 2011

I assume you're referring to the Dynamic Language Runtime (DLR). DLR does nothing even close to invokedynamic for optimization.

The best you can do with DLR is regenerate little dispatch stubs at each dynamic call site, to at least avoid the overhead of re-searching the class hierarchy. These stubs are then reinserted at the dispatch point and used to make the call.

Unfortunately, since all current CLR implementations do not optimize code at runtime, these newly-created stubs can't be optimized with the rest of the code. So you get a call from the code to the stub, a few guards, and another dispatch from the stub to the target. Those three pieces never optimize together.

invokedynamic is also very useful for optimizing things other than method dispatch. In JRuby master, we're using invokedynamic for constant lookup (as illustrated in this post) and for lazily creating literal values. In both cases, it reduces the access to a single memory hit, much faster than what JRuby 1.6 or any DLR languages can do.

tl;dr: DLR is the best you can do for dynamic dispatch optimization at an API level atop current CLR impls and subject to limitations therein. invokedynamic brings true end-to-end dynamic dispatch support to the JVM.

equark · on Aug 10, 2011

Thanks, this is exactly what I was looking for. DLR != invokedynamic contrary to my impression of how the DLR operated. I will be interested in see how JRuby performance improves as a result.

headius · on Aug 10, 2011

Don't get me wrong, I think the DLR is a great piece of work. It's just limited in how much it can optimize dynamic dispatch since CLR itself can't dynamically optimize.

As far as tooling for building languages, DLR is pretty epic. It's too bad Microsoft decided to bail on dynamic language work.

brimpa · on Aug 10, 2011

The first commenter from the blog got it right

> Seems unhelpful to compare benchmarks where one shows the work completely optimized away.

The author doesn't offer any benchmarks of (something resembling) real code.

headius · on Aug 10, 2011

It's mostly lucky (or unlucky) that the constant benchmark now optimizes away to nothing. As I mentioned in the comments (replying to that commenter), my point was to show that the overhead of constant lookup is now nearly zero, so if the values aren't used they won't be accessed. The actual work done for constants that don't disappear would be roughly equivalent to a simple memory access...very fast indeed.

I will probably do future posts with less synthetic benchmarks, but it's ultimately just hard to benchmark a specific language feature in the presence of optimizations like the JVM performs.