Mongo BSON Injection: Ruby Regexps Strike Again

jasonmp85 · on June 4, 2015

> 3 years ago I wrote a blog post about broken regular expressions in Ruby, ^$ meaning new lines \n.

Are they actually broken? One of the first quirks I learned writing Ruby is that you use \A and \z instead of ^ and $.

I know better than to blame security vulnerabilities on "bad programmers" rather than usability problems with a language (or plain old PoLA violations), but changing the meaning of these anchors will be a tough migration, possibly.

zer01 · on June 4, 2015

"Broken" is a subjective thing, but quirks that break extremely well established patterns (like how people validate the beginning and end of a line in regex) is where a lot of security issues tend to crop up.

Considering that I would venture to say that this is indeed a broken thing since it's a convention that I've only heard about in Ruby. It's a special snowflake in a component that developers use almost exclusively for validation (regex), with a pretty huge gotcha, especially because it appears to function as intended.

cremno · on June 4, 2015

Ruby inherited this behavior from Emacs (matz' favorite text editor) among other things.

pyre · on June 4, 2015

While I tend to agree with you, how many people have tried to downplay C's deficiencies with "good programmers following good practices don't introduce buffer overflows into their code?" Where has that gotten us?

I'm not saying that I have a solution, but I can't help be see the parallels.

developer1 · on June 4, 2015

Why is everybody talking about ruby's regex implementation? ^, \A, $, \Z, \z are standard across many languages, based on PCRE syntax. This isn't a ruby problem; this is simply developers who are not experts with regular expressions not fully understanding how to write patterns with the tool they are using. There are no problems with regex implementations, only problems with regex use.

Anybody interested should have a read through http://pcre.org/pcre.txt. The syntax presented here is used in perl, php, ruby, python, and many others.

Also, nobody ever uses \A without the /m flag. You use ^, it has the same meaning unless you specifically add the /m flag to allow ^ to match at the beginning of any line rather than at the front of the string only. This distinction will only bite developers who just add flags like /msig for every regex, because again they don't understand exactly what every flag actually does.

dozzie · on June 4, 2015

> ^, \A, $, \Z, \z are standard across many languages, based on PCRE syntax.

Uhm... No? ^$ have always meant beginning and end of string, not line, unless you turn on a flag. Not by default. You can check PCRE's documentation if you don't believe me. And then, there's Ruby:

  irb(main):005:0> "foo\nbar".match /o$/
  => #<MatchData "o">

riffraff · on June 4, 2015

ruby has half multine mode by default, which causes the issue. Namely, /m only changes the behaviour of "." to match \n, but ^ and $ work the same.

(and incidentally, that would make changing it easier, you can just request that users specify a flag all the time and deprecate the one without).

developer1 · on June 5, 2015

Eww what the hell. I don't use ruby, and if I did I might very well - as a PCRE half-expert - fall into this trap based purely on the assumption that ruby was using PCRE. I just looked at their Regexp class, and it matches PCRE in most regards. The fact that /m makes . match newline \n is horrible - every PCRE-based implementation uses /s for that, whereas /m only affects ^ and $.

It still falls on the developer to understand the exact flavor of regex available in their language. And yet ruby is doing a disservice to anybody coming to their language with existing PCRE knowledge by having syntax that is almost an exact match to PCRE used in many languages... only to find out someday that it's not. Harsh.

Nitramp · on June 4, 2015

Regular expressions are dangerously complex for validation code, it's too easy to overlook something or miss some context (like multiline mode, see also riffraff's comment).

So the problem is probably not developer knowledge (note that you got it wrong, too!), but rather that regexps are too hard to get right.

getsat · on June 4, 2015

I use ^$ vs \A\z as an interview question for senior engineers. Sadly, it filters a lot of them out.

llasram · on June 4, 2015

Sounds like a pretty successful bidirectional filter to me. (As in: you apparently don't want engineers who don't know this detail, and I wouldn't want to work at a company where that kind of trivia is a litmus test.)

getsat · on June 5, 2015

It's not trivia. If someone doesn't know the difference, they're going to allow bad data into our database. Large webapps with poor model validations are security and maintenance nightmare.

getsat · on June 6, 2015

Actually, I am reminded of an error that happened which was similar to this. After I left a past company, an engineer flubbed a validation which allowed a subtle bug to go undetected for 10 days which cost the company $500,000.

$50,000/day is an expensive lesson!

homakov · on June 11, 2015

Was it abused? money stolen?

getsat · on June 18, 2015

Hey Homakov, I'm a big fan of yours. :)

No, money wasn't being stolen, but the validation error meant that clients' money was being spent and not being tracked. The company had to eat the costs.

gabeio · on June 4, 2015

To clarify this isn't Mongo's BSON, it's Moped's implementation of BSON/Ruby's implementation of BSON (again). The title is fairly misleading making it sound like it's actually Mongo which is vulnerable. Still interesting stuff though.

mbell · on June 4, 2015

This is not correct.

The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo Mapper.

The only thing that _isn't_ vulnerable is Moped's BSON implementation (if reasonably recent), but it was dropped in Moped 2.x.

In reality if your using Mongo with Ruby, your most likely vulnerable, unless you happen to be on Moped 1.x.

[1] https://github.com/mongodb/bson-ruby/blob/84d8acd32ce9067ad6...

gabeio · on June 4, 2015

> This is not correct.

> The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo Mapper.

Then it's in the ruby gem of MongoDB's driver for ruby NOT in MongoDB. The title is still misleading for people who do not code in ruby and therefore are not vulnerable to the apparently ever present ruby BSON bug.

> Mongo BSON Injection

A better title would be Mongo gem BSON Injection

I am not trying to nit-pic I was fairly confused when seeing the title because I don't code in ruby and was 99% sure Mongo's core was C not ruby.

lvh · on June 4, 2015

This doesn't detract from your point, but Mongo is primarily C++, not C.

homakov · on June 4, 2015

It's the title - I'm using the minimal number of words to carry the idea.

rsutphin · on June 4, 2015

The article describes how this problem was present in Moped's BSON implementation, then fixed. Then later, Moped replaced its own BSON implementation with BSON-ruby, which had a version of the method which was not vulnerable. Later still, BSON-ruby's method was changed, making it vulnerable. BSON-ruby is, AFAICT, the official BSON library for ruby from Mongo.

As I understand it, the vulnerability is in any ruby application which uses a vulnerable version of the bson gem and which accepts object IDs from user input. You don't have to be using Moped.

maerF0x0 · on June 4, 2015

Here's one lurking in javascript:

/A-z/ includes "[]^_`" [1]

Now go search github [2] and see the +1k repos that have this bug in their parsing of base64

Sources: [1]: http://wtfjs.com/2014/01/29/regular-expression-and-slash

[2]: https://github.com/search?utf8=%E2%9C%93&q=INVALID_BASE64_RE...

pxndx · on June 4, 2015

This isn't WTF at all, that's exactly how regexes should work!

kiallmacinnes · on June 4, 2015

In case anyone is wondering why this isn't a WTF - See [1]..

[1]: http://www.asciitable.com/

maerF0x0 · on June 4, 2015

agreed, not my website so i couldnt control the name. Just talking about how A-z includes non(standard) base64 chars

sandstrom · on June 4, 2015

Interesting!

I didn't understand it at first, but the key difference is `A-z` vs. `A-Za-z`.

jph · on June 4, 2015

One way to help protect this kind of issue is to be explicit about validation steps.

This is the buggy code:

    !!str.match(/^[0-9a-f]{24}$/i)

That regex is trying to do three different things: validate the length is 24, validate the string contains alphanums, and ensure the matching is pinned from start to finish.

I prefer code that makes the validation steps explicit and simpler:

    str.length==24 && str!~/[^0-9a-z]/i

on June 4, 2015

[deleted]

simcop2387 · on June 4, 2015

the [^ part negates the characters inside the []. So it'll match anything that is NOT 0-9a-z, case insensitive /i. The !~ then says that str should not match the regex. so you end up with it saying that str should not match anything other than 0-9a-z case insensitive.

arielby · on June 4, 2015

Its a whitelist. It is "verify the string is a 24-character alphanumeric string".

_0nac · on June 4, 2015

Egor Homakov strikes again. This guy is a machine, just look at the vulnerabilities he's found this year alone:

http://sakurity.com/blog

atonse · on June 4, 2015

Oh boy I'd love to watch this guy in action in a twitch.tv stream, taking down a site (to clarify... white hat stuff!). It would be so damn fascinating to see how his mind works and how he makes the leap from 0 to exploit.

mikekchar · on June 4, 2015

For this kind of stuff, I'm not sure it would be that interesting. Lots of staring off into space... scrolling the source code... cursing people who can't write code that other people can understand... oh! That doesn't look right... Only the "oh!" part would be about 20 hours into the stream ;-)

I once had an idea to invite open source developers to remotely pair with me for a day on something (a bug fix, some feature they are working on, whatever)... I would record the session, do an introduction to the problem, and then edit the session down to about an hour or an hour and a half. I think it would be fascinating (if a lot of work).

Maybe some day...

pjc50 · on June 4, 2015

That sounds unfeasibly dull, compared to the nice text explanation. You can see what someone's typing but that really isn't what's in their head.

icpmacdo · on June 4, 2015

Is there anything currently like this for other people?

tobeportable · on June 4, 2015

I've been thinking about live coding on twitch but having to think about not revealing any sensitive information would make it a hassle. I will give it a try for some side projects.

mambodog · on June 4, 2015

Not to mention this classic, which was probably his original claim to fame: https://news.ycombinator.com/item?id=3663197

sarciszewski · on June 4, 2015

Yeah, he's impressive. Hats off to his persistence and ingenuity. Would love to collaborate with him sometime.

elcct · on June 4, 2015

No, let's keep white hat on...

sarciszewski · on June 4, 2015

What exactly are you accusing me of?

robflynn · on June 4, 2015

I think you mis-parsed the response. You said "hats off" and elcct was making a joke based on that re: white hats. I don't think there was any accusation intended.

sarciszewski · on June 4, 2015

Oh, I thought he was calling me whitehat scum.

Thanks for clarifying.

hnanon6 · on June 4, 2015

I suspect that many of these same Rubyists who mistakenly assumed ^ and $ have the normal PCRE semantics would still readily claim to know Ruby, and in many cases, even to know it well. I think this type of vulnerability undermines the optimistic belief that any decent programmer can quickly learn a new language, and further, it shows the danger of adopting new languages generally, especially those with extremely complex and not particularly well-defined syntax and semantics, like Ruby.

I also wonder how many vulnerabilities result just from Rubyists favoring cutesy APIs (or "DSLs," as they call them) that while making for great demos, hide the often times unignorable, crucial details of what they do from their users.

gshutler · on June 4, 2015

Where's the attempt to submit a patch to fix the problem before disclosing?

cheald · on June 4, 2015

The 1.x versions of BSON are vulnerable, too, FWIW.

rsutphin · on June 4, 2015

I tested on our app (which uses BSON-ruby 1.9.2) and was surprised to find that the detection code indicated it was not vulnerable. Turned out it was because we also use bson_ext — bson_ext replaces the vulnerable method with a C implementation which doesn't use regexes.

ploxiln · on June 4, 2015

Kinda funny to see a "safe" language saved by C. Just sayin'

cheald · on June 4, 2015

Oh, that's a good catch. I checked on JRuby, which doesn't use bson_ext.

eddanger · on June 4, 2015

Added a patch for Moped::BSON in Rails here

https://gist.github.com/eddanger/9408317d5d508d8e9ba7

veesahni · on June 4, 2015

This has been fixed. Just update moped:

gem "moped", "~> 2.0.5"

allcentury · on June 4, 2015

Is this only an issue if you are defining the BSON _id ?