Add bytecode cache to Ruby

mbauman · on June 17, 2015

What's the relationship between github/ruby and ruby/ruby? It looks like they've diverged quite a far ways away from each other, but that might just be an artifact of which branches GitHub uses when comparing the two.

randall · on June 17, 2015

Seems like this is Github's (the company's) fork of Ruby. ruby/ruby seems like the official ruby.

YorickPeterse · on June 17, 2015

Correct, Github uses their own fork to apply changes for their own needs.

mbauman · on June 17, 2015

Do you know how much they try to push their own changes back upstream?

sams99 · on June 17, 2015

Yes, Aman and Koichi work very closely, the main point of difference at the moment is the method cache patches, Koichi is working on getting something similar implemented in MRI

ksec · on June 18, 2015

Why something similar but not the same?

claudiug · on June 18, 2015

That is a very good question, and I also I'm curious to find out

ksec · on June 18, 2015

As if the Ruby Community has more then enough resources to improve on TWO Compiler / Interpreter at the same time. Isn't it better to work together then to reinvent the wheel?

Speaking of compiler, the JIT for Ruby development has been very quiet for months.

elektronaut · on June 17, 2015

Aman Gupta (who wrote this PR) is a Ruby core committer.

haberman · on June 18, 2015

I wrote a benchmark that measures the speed of various VM parsers and the speedup that precompiling brings. I found that precompiling was a huge speed benefit: http://blog.reverberate.org/2014/10/the-speed-of-python-ruby...

daurnimator · on June 17, 2015

Interesting...

The Lua community has found that bytecode is actually slower to load than it is to generate from source: The extra latency of loading the (larger) bytecode from disk/ssd/flash, exceeds the cpu time to lex/parse.

ploxiln · on June 18, 2015

On the other hand, Lua syntax is much simpler

(slightly out of date: http://programmingisterrible.com/post/42432568185/how-to-par...)

daurnimator · on June 18, 2015

> slightly out of date: http://programmingisterrible.com/post/42432568185/how-to-par...

Ouch.... Perhaps a lesson in making your language's grammar too complex: if you do, you'll eventually have to pre-compile.

TheLoneWolfling · on June 18, 2015

So has Ruby joined the ranks of languages that formally cannot be parsed due to the halting problem?

I know Perl is in that category.

vidarh · on June 18, 2015

I don't think so. There are a few things that looks distinctly iffy in that respect on the surface, but they are resolved.

e.g. is "foo" a method call or an instance variable? You can't know in isolation, but it doesn't matter at parse time, as if it's part of a larger construct that is only valid as a method call, such as if there's an argument list after "foo", it is parsed as a method call. E.g:

    foo = 1
    foo(42)

will result in:

    test.rb:4:in `<main>': undefined method `foo' for main:Object (NoMethodError)

I think all of the potential cases that might have otherwise made Ruby impossible to formally parse are resolved in similar ways.

Now, there are certainly layering violations. The aforementioned example of "foo" by itself can only be resolved by determining whether or not "foo" is in scope as a local variable at the point it is referenced, for example, but you can opt to defer the decision until after parsing.

chrisseaton · on June 18, 2015

> So has Ruby joined the ranks of languages that formally cannot be parsed due to the halting problem?

No I don't think anyone has suggested that, have they?

ploxiln · on June 18, 2015

ruby has a parse.y yacc grammar source file (linked to from the post I linked), and yes it can be parsed separately from being executed.

However, IIRC, no one has been able to re-implement parse.y; jruby and the other ruby re-implementations copied it and made what adaptations were necessary.

mst · on June 18, 2015

I've seen similar results with perl bytecode.

Our syntax probably isn't simpler :)

haberman · on June 18, 2015

I wrote a benchmark that measures just CPU impacts, and it finds that loading source code takes 25x longer: http://blog.reverberate.org/2014/10/the-speed-of-python-ruby...

daurnimator · on June 18, 2015

A few things:

  - you don't drop the disk cache; so you're loading from RAM (echo 3 > /proc/sys/vm/drop_caches)
  - you don't have very complex code
  - you timing includes starting/loading the intepreter itself.

haberman · on June 18, 2015

> you don't drop the disk cache

Yes that is why I said the benchmark measures "just CPU impacts."

I don't think dropping cache alone would be sufficient to simulate loading of a large project -- I think the benchmark would also need to split the input across many files to recreate the seek time effects.

> you don't have very complex code

Given what I know about parsers, I highly doubt that the performance profile of parsing real code differs from my benchmark very much (ie. >30%). But I'm happy to be proven wrong on this, if anyone wants to try.

> you timing includes starting/loading the intepreter itself

That is why the benchmark makes the files pretty big, so the constant interpreter startup overhead is not a significant factor. Again, I don't think that you're going to be able to significantly change the shape of the results by adjusting for this.

Arnor · on June 17, 2015

How significant is the performance impact on a mid-large Ruby on Rails application?

cheald · on June 17, 2015

A bytecode cache is unlikely to affect runtime performance; only load times.

fizx · on June 17, 2015

It's mostly annoying in development when your mid-large Rails app takes 30s-2m to start-up and/or reload after an edit.

vidarh · on June 18, 2015

Most of the time that is likely due to bundler/rubygems stuffing your load path full and causing thousands of unnecessary stat calls.

The actual time spent loading/parsing files is in most cases a tiny fraction of the startup time of any large project using rubygems and bundler.

I counted several hundred thousand unnecessary stat calls on the biggest app I have, and ended up with a ugly hack where we trimmed the load path around each set of require's to only paths needed by that specific gem.

foz · on June 18, 2015

Or, as I've seen a few times, you have circular references in your rails asset includes.

JulianWasTaken · on June 18, 2015

Hah. .pyc files are one of the worst part of Python for developers.

export PYTHONDONTWRITEBYTECODE=true is the first thing anyone should be doing.

I guess it figures that we copy each others' mistakes.

michaelmior · on June 18, 2015

I've been coding in Python for many years and I can count on one hand the number of times that this has actually caused me any problems.

JulianWasTaken · on June 18, 2015

What is the total amount of time you spent during those few times, although including the amount of time to learn what happened, and how many thousands of times larger is that amount than the sum total of the time saved by caching bytecode compilation every single time you loaded a pyc for every file you ever wrote?

michaelmior · on June 20, 2015

Probably a few minutes spent. I'm not sure how to quantify the overhead of caching compiled bytecode, but I'm guessing it beats that.

JulianWasTaken · on June 21, 2015

It doesn't, which was my point :)

The overhead is tiny, less than milliseconds for sane modules. It's a useless optimization, especially when it can be done at install time only, say, as opposed to for every module import, but even that is somewhat silly.

michaelmior · on June 25, 2015

Without seeing any numbers, it doesn't mean much to me. I'm assuming someone much smarter than me has identified the benefit of bytecode caching and unless it really gets in my way, I see no need to do away with it.

gerner · on June 18, 2015

can you be more specific?

robbles · on June 18, 2015

Refactoring a module to a package (e.g. mymodule.py -> mymodule/__init__.py) can cause errors if you don't clear the bytecode, since it can pick up mymodule.pyc and fail to see the new files.

Not sure about the exact conditions when this occurs, but it's definitely happened to me when code is refactored and you pull the newest version in with git.

aaronbrethorst · on June 18, 2015

Does anyone know, offhand (or have a good educated guess), on what the largest monolithic Ruby codebase in existence is? Is it GitHub?

ksec · on June 18, 2015

Cookpad from Japan?

https://speakerdeck.com/a_matsuda/the-recipe-for-the-worlds-...

50 million unique user / month 15,000 Req/Sec

I posted this awhile ago but didn't pulled much attention

https://news.ycombinator.com/item?id=9161220

claudiug · on June 17, 2015

what is the advantages of adding bytecode cache?

nnq · on June 17, 2015

speculating: it allows you to still be fine with a slower parser. and this means you can theoretically make the parser smarter (like make it do some inference and catch some bugs at parse-time?) without worrying that much about performance. but then again, I can't imagine what kind of bugs could a parser catch for such a dynamic language like Ruby, so they're probably just doing it to shave some milliseconds from program-start-up time...

btown · on June 17, 2015

With huge libraries (like SpreeCommerce, for instance), startup time can be seconds or more even on recent hardware, and the vast majority of that code isn't changing between runs. So if every gem was cached, they could see radically faster times to load their test suites across their entire organization. It's well worth it even without a smarter parser.

Zash · on June 17, 2015

It lets you skip parsing of source code into bytecode if the sources have not changed.

cpr · on June 17, 2015

Same as for .pyc -- caches the result of parsing and translating the parse tree into bytecodes.

FooBarWidget · on June 17, 2015

In other words, faster loading times?

Mojah · on June 17, 2015

This is the equivalent of APC or OpCache in PHP's world?

cheald · on June 17, 2015

No, Ruby behaves like APC by default; code is loaded and parsed once per process, and then the AST is reused from memory. You can do some stuff to hot reload from disk (ie, Rails' reloader), but you have to intentionally do it.

This would just improve the time to initially start a Ruby process.

Freaky · on June 18, 2015

PHP needs an opcache because its standard behaviour is to discard all state after each request - including all compiled bytecode - kind of emulating the CGI model.

Ruby web apps on the other hand tend to be run in a loop - the script never exits after a request, it just goes back to the top of the loop to accept the next request to serve.

e.g. https://github.com/rack/rack/blob/master/lib/rack/handler/fa...

        FCGI.each { |request|
          serve request, app
        }

It's nice because it completely decouples your app's startup time from request processing time, so you can do expensive setup stuff up-front without slowing down each request. The disadvantage to that is there's not much pressure to keep startup time low, since it's only happening once.

juliangregorian · on June 18, 2015

Yeah man, PHP makes scaling easy, it's shared nothing by default, do you even web scale bro? /s