This should also be a reminder to everyone that you shouldn't be reliant on a single point of failure for your deploys. It's something that we in the Python community have already encountered (and hopefully learned from) due to the historical unreliability of our equivalent package repo, PyPI.
Have an internal repo that's accessible by your deploy servers, which in turn locally caches anything that you might have previously needed to externally fetch.
puts all your app's dependencies in vendor/cache. That can then be put into a git submodule.
The problem then becomes the Gemfile and Gemfile.lock, which should really be in that submodule as well. You need to pass flags to bundler commands because it assumes the Gemfile is in the project root.
I don't think Heroku's deploy is smart enough to recognize that you've packaged, right? It'll still try to bundle install, which would break in the current situation.
I think a full solution requires packaging, and using a modified buildpack that skips the bundle step.
Places the gem binaries in vendor/cache, as noted. SCM those.
"While installing gems, Bundler will check vendor/cache and then your system's gems. If a gem isn't cached or installed, Bundler will try to install it from the sources you have declared in your Gemfile."
Yeah, I knew that part...I wasn't sure what the default heroku ruby buildpack did. I'm still digging into the source to see what the build process is. It's non-trivial.
UPDATE:
For others' edification, the default heroku ruby buildpack respects vendor/cache, but will purge it in the following scenarios:
* if vendor/ruby_version exists
* if vendor/heroku/buildpack_version exists, but vendor/heroku/ruby_version does not
* if the bundler cache exists, but vendor/heroku/ruby_version file specifies a different version of ruby than the one actually being used.
The way we handle that for our Python deploys is to have a separate "deploy" git repo which includes complete .tar.gz files of all of our dependencies, then have our pip requirements.txt file point to those file paths rather than using external HTTP URLs.
To avoid packages sneakily trying to download their own dependencies from the internet we run pip install with a "--proxy http://localhost:9999 argument (where nothing is actually running on that port) so that we'll see an instant failure if something tries to pull a dependency over the network.
We do something very similar, but like you said there are the occasional sneaky devils trying to download their own dependencies. Nine times out of ten it seems like it's some version of distribute that they insist on fetching.
The non-existant proxy trick seems useful, I'll have to try that out.
Indeed. I presume this is why Perl's package repository CPAN is actually a network of repositories ("Comprehensive Perl Archive Network"); Wikipedia says CPAN "is mirrored worldwide at more than 200 locations."
Does anyone know why rubygems does not work this way? I had always just assumed it did (due to the historical intertwining of Ruby and Perl communities).
The central architecture of rubygems allows you to publish and yank gems within minutes. CPAN takes some hours (and deletion may not be controlled).
Personally I'm a big fan of the CPAN approach as it is fairly simply. Just mirror via FTP. It's a nobrainer to setup and run a mirror.
That said, CPAN's master (PAUSE.cpan.org) is a SPOF as well.
What I like is that not a single party is responsible for paying server bills + maintaining the platform. Ruby Central and the team of volunteers do a great job, but in the end, people only care when something breaks.
Instead every big company/university that profits from the Ruby ecosystem should imho run a public rubygems mirror as a contribution to the open source world. That's common practice for other projects, too. Think of all mirrors of the Linux distributions, kernel.org, cpan, python etc.
I also want to mention, that ftp.ruby-lang.org is a single homed box. There is no other official mirror of the MRI/C-Ruby source that can be used as failover or load balancer. This is bad, too.
Agreed. We've looked into running our own mirrors for rubygems and it's there's nothing really supported out there. The addition of git gems in bundler means you'd really need a git mirror tool as well.
If I had to guess, I would wager it's because it's expensive and hard. Plus there's the fortunate coincidence that - as far as I recall - rubygems has mostly Just Worked Fine, Thank You Very Much™ .
(I miss the days from when github also hosted a gem repository…)
Solving the authenticity problem alone is probably not fun – tho obviously there is much to be learned from CPAN. Given recent problems there will probably be enough political will to make this happen in the future, though.
I only recently realized how easy it was to run your own PyPI - it just has to handle a few HTTP GET / POSTs.
If you want to run your own PyPI internally, here's a very simple PyPI server (~150 lines of Python) that I wrote:
https://github.com/steiza/simplepypi
What I've personally been looking for is an easy to setup caching proxy for PyPI. Something that is pip-compatible and serves files if it has them but will also fetch and then store packages if it doesn't. That way you could build up a collection of 3rd party packages over time, without having to explicitly manage it.
It probably wouldn't be hard to roll my own with a reverse proxy but it never gets moved to the front burner.
For most people/enterprises, no. But there are still many places in the world -- in the US, even -- with slow and/or spotty internet connections, so it would make sense for them.
> It's something that we in the Python community have already learned due to the historical unreliability of our equivalent package repo, PyPI.
learned sounds a touch condescending to me for some reason. The python community has certainly run into it, but (anecdote time) in my experience people still often rely on pypi for their deploys (but use the --mirrors option to pip).
Encountered may be more appropriate.
True, "learned" does sort of imply that it's a best practice now used by nearly everyone in the community. I know that's far from the truth. "Encountered" is more appropriate, so I'll edit my OP.
I'm surprised that rubygems.org of all places did not see fit to patch the vulnerability that's now known for multiple weeks and which has been declared to be incredibly dangerous and for which ready-made exploit kits exist.
rubygems.org is a central distribution platform trusted by tons and tons of projects. As such, that site is the one site you probably do not ever want compromized. Imagine the damage an attacker could deal by uploading backdoored versions of various popular gems.
I know - applying security patches is time-consuming and we are all afraid of breakage. But the moment rubygems.org stepped up to be a semi-official central distribution point for gems, I would have hoped they also took on the responsibility that goes along with that.
If this was some new unknown 0day exploit, I would be much more understanding, but this was known to exist, known to be dangerous, known to be exploited.
Semi related CVEs aside (CVE-2013-0333 and CVE-2013-0156): YAML's security problems have been known for years by the community. YAML aside, don't trust user input. This is egg on rubygems's face and the ruby community. I don't think http://rubycentral.org/ has full-time staff for rubygems either.
Gems, specifically should be signed. They are not, this type of exploit will continue to happen, hell, remember when github screwed up ssh keys? Who knows what's in the ecosystem.
TL;DR; Ruby's security ecosystem is butter.
Disclaimer: I love ruby and use it daily. It was two critical problems IMO: unsigned code and GIL. Yes, GIL. I'm looking at you ruby-core.
It would allow you to verify the authenticity of the gems, even if the server had been compromised.
(This isn't a new technique -- for example, .deb packages distributed through APT are usually signed with gpg -- IIRC, this was a measure introduced years ago in response to a Debian mirror being compromised.)
Debian has (had?) a high barrier to entry to become a developer, and every developer signs their packages. The release binaries are arranged on a secured box and the release key itself is held by a limited set of people.
In short, the signatures work because of the human element and organizational structure of Debian.
Rubygems accepts submissions from the general public.
The DAY I manage to convince the big wigs where I work that we should switch from a typical shared environment to Heroku, this happens.
Talk about luck. :(
Hopefully I can spin this and not leave a bad taste in their mouths. We (engineers) understand what's happening, management doesn't and they don't give a shit.
From a manager perspective, this sort of alert (as well as recent Heroku emails telling you specifically which app(s) needed patching for the recent Rails CVE issues) is a great example of one of the extra benefits of Heroku. A team of engineers 'watching your back' at no extra charge is a good thing.
Agreed. I was surprised when I received an email from Heroku letting me know that a few of my apps needed to be updated after the Rails vulnerabilities were uncovered. They also named the apps that needed to be updated, which makes my job that much simpler.
I guess they had to build the feature. With the follow-on exploit for rails <3.1, the notification email went out very quickly and probably they will have quick notifications going forward.
The exploit still exists whether you are on Heroku, shared or bare metal hosting environment. It's not an issue specific to Heroku, it's an issue that affects ruby gems. Your situation would be worse if you convinced the big wigs to switch to Ruby today.
"Thank goodness we switched to Heroku. Had we stayed on our previous environment, we would have been opened up to a security exploit without even knowing it."
Feel free to reach out to me if you'd like to have a conversation about how to support your case on Heroku or if you have any questions/concerns; raj@heroku.com
What is the alternative? You stay on the shared hosting and then what, you get hacked because you didn't verify a gem?
Are the bigwigs going to authorize you time to look through all gems for potential backdoors or are they going to get i for free with their Heroku hosting?
I always assumed that Heroku would have an internal proxy for gems. Seems like 80% of users would probably be fetching the same gems that another user might have just fetched. Perhaps something like this could be versioned or snapshotted so that in the event of something like this, you could roll your cache back to that snapshot and let people deploy who had gems in that cache.
That's a good question. I do know one thing though: I deploy multiple times per day and typically none of my gems have changed.
I guess it would depend on the folks doing the investigation. If an exact timestamp could be determined for when things could have been compromised, you just roll back to a short while before that time.
Heroku engineer here: Ruby deploys are back online if you don't require any new gems, i.e. can deploy from the existing cache. We're still working on resolving the large problem with Ruby Gems.
> source of all dependencies possible should be in your repository
How far do you go? Do you include libxml for building nokogiri? Heck, do you include libc and gcc for building any gem with a C extension?
Coming from Java, Maven and something like Nexus Sonatype make it easy (for certain values of "easy") to run a proxy repository. The equivalent of all "gem install <some_gem>" goes through the proxy, which continues to serve gems even if the original source goes away.
I don't particularly like the inclusion of dependencies in a repository. Is this a custom version "some guy" long gone from the company created three years ago? Can I safely upgrade it to get security fix <X>? I suppose similar questions arise no matter the source...
This is reason why you cryptographically sign your gems before publishing them. I (unfortunately) had not known this was supported by RubyGems, but it is: http://docs.rubygems.org/read/chapter/21
But I'll bet very few gems are signed. Rails does not appear to be:
Between this hack and the recent Rails vulnerabilities, it seems like a perfect storm. I wonder if either the hack attempted to tamper with the Rails gems to catch late updaters or to remove the ability to use RubyGems to update to the latest versions and keep vulnerable sites vulnerable.
It more looks like this was a natural extension of (part of) the Rails vulnerabilities. People saw that YAML on Ruby has a giant gaping security hole in it and was commonly used to decode user-supplied data.
I would not be surprised if we see even more of this as people feel out all of the other places that YAML is used as a user-facing data interchange format.
I think this hack was related to the recent Rails vulnerability. Heroku is blaming this on a "YAML parsing vulnerability" which I think was the problem same issue with rails. I'm not sure if they are using rails or not but its surprising that if it's the same issue they didn't do anything about it before this happened.
I was looking for a RubyGems proxy a couple of weeks ago but was unable to find anything suitable. What I would like to find is something similar to Artifactory for Maven. You include the proxy in your Gemfile and if it doesn't have the Gem it downloads it from RubyGems and caches it locally.
This type of proxy wouldn't help in this particular case but it would allow you to keep traffic to RubyGems.org down and also give you the ability to easily host private Gems.
You're probably aware, but it's possible to host private gems in a simple static webserver. We have our CI server copy gems over and run a script that calls "gem generate_index -d /path/to/gems".
Completely featureless, of course. Lacks the newer rubygems.org api (so bundler is stuck downloading the whole index), and obsolete versions will stick around without manual intervention (which, if you have too many, is a pathological case for bundler's resolver).
Not that this line of discussion is particularly constructive, but I agree. I'm not sure why someone would think a giant head staring at the reader is a good idea, but it's not.
On a 27" monitor, it's almost like a child sized head right up in your face. I can only imagine it being even worse on bigger screens.
We treated .gems like source tarballs (they can't even express non-Ruby dependencies, Gem::Specification#requirements are treated as comments), and only used rubygems.org for newer versions of our dependencies to be packaged as .rpms for test and production deployments. I found that to be much more sane than bypassing the system package manager and smuggling random crap onto dev servers, much less production.
This is the responsible thing to do. Going through the gems and verifying they aren't compromised is a lot of work. We should be thankful for all the effort the rubygems maintainers and other volunteers are putting into cleaning up this mess.
A tangent, but I always thought "YAML" was pronounced /'jæm.ḷ/, however the post's use of "an YAML" suggests it's actually pronounced /waɪ.eɪ.ɛm.ɛl/. Weird.
Don't worry too much about this. A lot of people haven't internalized the correct rule (use "an" before a vowel sound, not a vowel letter), and instead just use "an" before a vowel, regardless of the sound it makes. I've certainly never heard anything but /'jæm.ḷ/ in the wild.
You still wouldn't say "an why", as "w" isn't a vowel. If Anything, I can understand "an yamel" as more legitimate (as "y" is at least sort of a vowel).
I didn't say that "an YAML" is correct; I was first correcting that "an YAML" would be correct if it were "an why aih ahm ell" (it wouldn't), and then went further into "if anything" land saying that "an yahmell" is at least something I can bring myself to say without feeling sick inside. ;P
If you are not updating gems, does it hurt to continue deploying with a custom buildpack? Heroku shouldn't repull gems if the gemspec hasn't been altered. Is that logic correct?
Does anybody have another suggestion for safely working around this issue? I don't have a clear sense for how long this will take to resolve and don't wish to slow down our release pace too much.
I thought at one point in time rubygems had a system in place to sign the contents of the gem? If not this might be an interesting addition. You could have a digest stored along side every gem file allowing you to validate the authenticity of the gem file... I'm sure others would have something to add to this idea...
It is possible for the publisher to sign the gems [1], but it's not common.
If rubygems.org is keeping fingerprints of each gem, then it still isn't sufficient, since those could have been compromised as well. If there's no other trustworthy source of fingerprints, then maybe we need to crowdsource it. Built a tool that will md5sum all the .gem files in your local cache directory, so that we can look for any files that were changed on rubygems.org
Just to reiterate the parent: This is only valuable if we trust the signatures - which I wouldn't if they were, say, just held along side the "hacked" gems server.
You still need the public key to validate the signature. If the attacker can change the public key, he can change the signature without you knowing - unless you explicitly want to trust each and every key for every gem you install.
Seems odd, however, that status.heroku.com lists the issue only on the development side, suggesting production apps are not affected? http://screencast.com/t/L36Hpx5dx
Running apps are not affected, but your ability to do development is. The threat vector affects app compilation, not app execution/scaling, which doesn't touch the tainted rubygems.org repositories.
I don't think that is correct. AFAIK the 'development' side of the heroku status panel is not related to developing, per se, it's the status for apps that are not running on production-level resources... for example, single-dyno apps, or apps not running with production flavor of database.
I agree that the production/development split is not entirely clear without additional explanation. We've spent a lot of time thinking about how to communicate these things and have so far not come up with a way that we feel better describes the issues at hand.
Thx for correcting that... have used that screen a million times without realizing they lumped dev apps and prod+dev workflow together. 3 columns would eliminate any confusion (Production Apps, Development Apps, Development Workflow) but might not look at clean.
lmao Ruby and RoR shame PHP in terms of security flaws. Making you unknowingly write security holes, ridiculous flaws discovered on daily/weekly basis, package management hacked etc I have never seen as ridiculous holes as RoR even in the CodeIgniter framework. Where are the RoR-haters when we need them?
Have an internal repo that's accessible by your deploy servers, which in turn locally caches anything that you might have previously needed to externally fetch.