Hacker News new | past | comments | ask | show | jobs | submit login
Mongo BSON Injection: Ruby Regexps Strike Again (sakurity.com)
128 points by homakov on June 4, 2015 | hide | past | favorite | 50 comments



> 3 years ago I wrote a blog post about broken regular expressions in Ruby, ^$ meaning new lines \n.

Are they actually broken? One of the first quirks I learned writing Ruby is that you use \A and \z instead of ^ and $.

I know better than to blame security vulnerabilities on "bad programmers" rather than usability problems with a language (or plain old PoLA violations), but changing the meaning of these anchors will be a tough migration, possibly.


"Broken" is a subjective thing, but quirks that break extremely well established patterns (like how people validate the beginning and end of a line in regex) is where a lot of security issues tend to crop up.

Considering that I would venture to say that this is indeed a broken thing since it's a convention that I've only heard about in Ruby. It's a special snowflake in a component that developers use almost exclusively for validation (regex), with a pretty huge gotcha, especially because it appears to function as intended.


Ruby inherited this behavior from Emacs (matz' favorite text editor) among other things.


While I tend to agree with you, how many people have tried to downplay C's deficiencies with "good programmers following good practices don't introduce buffer overflows into their code?" Where has that gotten us?

I'm not saying that I have a solution, but I can't help be see the parallels.


Why is everybody talking about ruby's regex implementation? ^, \A, $, \Z, \z are standard across many languages, based on PCRE syntax. This isn't a ruby problem; this is simply developers who are not experts with regular expressions not fully understanding how to write patterns with the tool they are using. There are no problems with regex implementations, only problems with regex use.

Anybody interested should have a read through http://pcre.org/pcre.txt. The syntax presented here is used in perl, php, ruby, python, and many others.

Also, nobody ever uses \A without the /m flag. You use ^, it has the same meaning unless you specifically add the /m flag to allow ^ to match at the beginning of any line rather than at the front of the string only. This distinction will only bite developers who just add flags like /msig for every regex, because again they don't understand exactly what every flag actually does.


> ^, \A, $, \Z, \z are standard across many languages, based on PCRE syntax.

Uhm... No? ^$ have always meant beginning and end of string, not line, unless you turn on a flag. Not by default. You can check PCRE's documentation if you don't believe me. And then, there's Ruby:

  irb(main):005:0> "foo\nbar".match /o$/
  => #<MatchData "o">


ruby has half multine mode by default, which causes the issue. Namely, /m only changes the behaviour of "." to match \n, but ^ and $ work the same.

(and incidentally, that would make changing it easier, you can just request that users specify a flag all the time and deprecate the one without).


Eww what the hell. I don't use ruby, and if I did I might very well - as a PCRE half-expert - fall into this trap based purely on the assumption that ruby was using PCRE. I just looked at their Regexp class, and it matches PCRE in most regards. The fact that /m makes . match newline \n is horrible - every PCRE-based implementation uses /s for that, whereas /m only affects ^ and $.

It still falls on the developer to understand the exact flavor of regex available in their language. And yet ruby is doing a disservice to anybody coming to their language with existing PCRE knowledge by having syntax that is almost an exact match to PCRE used in many languages... only to find out someday that it's not. Harsh.


Regular expressions are dangerously complex for validation code, it's too easy to overlook something or miss some context (like multiline mode, see also riffraff's comment).

So the problem is probably not developer knowledge (note that you got it wrong, too!), but rather that regexps are too hard to get right.


I use ^$ vs \A\z as an interview question for senior engineers. Sadly, it filters a lot of them out.


Sounds like a pretty successful bidirectional filter to me. (As in: you apparently don't want engineers who don't know this detail, and I wouldn't want to work at a company where that kind of trivia is a litmus test.)


It's not trivia. If someone doesn't know the difference, they're going to allow bad data into our database. Large webapps with poor model validations are security and maintenance nightmare.


Actually, I am reminded of an error that happened which was similar to this. After I left a past company, an engineer flubbed a validation which allowed a subtle bug to go undetected for 10 days which cost the company $500,000.

$50,000/day is an expensive lesson!


Was it abused? money stolen?


Hey Homakov, I'm a big fan of yours. :)

No, money wasn't being stolen, but the validation error meant that clients' money was being spent and not being tracked. The company had to eat the costs.


To clarify this isn't Mongo's BSON, it's Moped's implementation of BSON/Ruby's implementation of BSON (again). The title is fairly misleading making it sound like it's actually Mongo which is vulnerable. Still interesting stuff though.


This is not correct.

The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo Mapper.

The only thing that _isn't_ vulnerable is Moped's BSON implementation (if reasonably recent), but it was dropped in Moped 2.x.

In reality if your using Mongo with Ruby, your most likely vulnerable, unless you happen to be on Moped 1.x.

[1] https://github.com/mongodb/bson-ruby/blob/84d8acd32ce9067ad6...


> This is not correct.

> The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo Mapper.

Then it's in the ruby gem of MongoDB's driver for ruby NOT in MongoDB. The title is still misleading for people who do not code in ruby and therefore are not vulnerable to the apparently ever present ruby BSON bug.

> Mongo BSON Injection

A better title would be Mongo gem BSON Injection

I am not trying to nit-pic I was fairly confused when seeing the title because I don't code in ruby and was 99% sure Mongo's core was C not ruby.


This doesn't detract from your point, but Mongo is primarily C++, not C.


It's the title - I'm using the minimal number of words to carry the idea.


The article describes how this problem was present in Moped's BSON implementation, then fixed. Then later, Moped replaced its own BSON implementation with BSON-ruby, which had a version of the method which was not vulnerable. Later still, BSON-ruby's method was changed, making it vulnerable. BSON-ruby is, AFAICT, the official BSON library for ruby from Mongo.

As I understand it, the vulnerability is in any ruby application which uses a vulnerable version of the bson gem and which accepts object IDs from user input. You don't have to be using Moped.


Here's one lurking in javascript:

/A-z/ includes "[]^_`" [1]

Now go search github [2] and see the +1k repos that have this bug in their parsing of base64

Sources: [1]: http://wtfjs.com/2014/01/29/regular-expression-and-slash

[2]: https://github.com/search?utf8=%E2%9C%93&q=INVALID_BASE64_RE...


This isn't WTF at all, that's exactly how regexes should work!


In case anyone is wondering why this isn't a WTF - See [1]..

[1]: http://www.asciitable.com/


agreed, not my website so i couldnt control the name. Just talking about how A-z includes non(standard) base64 chars


Interesting!

I didn't understand it at first, but the key difference is `A-z` vs. `A-Za-z`.


One way to help protect this kind of issue is to be explicit about validation steps.

This is the buggy code:

    !!str.match(/^[0-9a-f]{24}$/i)
That regex is trying to do three different things: validate the length is 24, validate the string contains alphanums, and ensure the matching is pinned from start to finish.

I prefer code that makes the validation steps explicit and simpler:

    str.length==24 && str!~/[^0-9a-z]/i


[deleted]


the [^ part negates the characters inside the []. So it'll match anything that is NOT 0-9a-z, case insensitive /i. The !~ then says that str should not match the regex. so you end up with it saying that str should not match anything other than 0-9a-z case insensitive.


Its a whitelist. It is "verify the string is a 24-character alphanumeric string".


Egor Homakov strikes again. This guy is a machine, just look at the vulnerabilities he's found this year alone:

http://sakurity.com/blog


Oh boy I'd love to watch this guy in action in a twitch.tv stream, taking down a site (to clarify... white hat stuff!). It would be so damn fascinating to see how his mind works and how he makes the leap from 0 to exploit.


For this kind of stuff, I'm not sure it would be that interesting. Lots of staring off into space... scrolling the source code... cursing people who can't write code that other people can understand... oh! That doesn't look right... Only the "oh!" part would be about 20 hours into the stream ;-)

I once had an idea to invite open source developers to remotely pair with me for a day on something (a bug fix, some feature they are working on, whatever)... I would record the session, do an introduction to the problem, and then edit the session down to about an hour or an hour and a half. I think it would be fascinating (if a lot of work).

Maybe some day...


That sounds unfeasibly dull, compared to the nice text explanation. You can see what someone's typing but that really isn't what's in their head.


Is there anything currently like this for other people?


I've been thinking about live coding on twitch but having to think about not revealing any sensitive information would make it a hassle. I will give it a try for some side projects.


Not to mention this classic, which was probably his original claim to fame: https://news.ycombinator.com/item?id=3663197


Yeah, he's impressive. Hats off to his persistence and ingenuity. Would love to collaborate with him sometime.


No, let's keep white hat on...


What exactly are you accusing me of?


I think you mis-parsed the response. You said "hats off" and elcct was making a joke based on that re: white hats. I don't think there was any accusation intended.


Oh, I thought he was calling me whitehat scum.

Thanks for clarifying.


I suspect that many of these same Rubyists who mistakenly assumed ^ and $ have the normal PCRE semantics would still readily claim to know Ruby, and in many cases, even to know it well. I think this type of vulnerability undermines the optimistic belief that any decent programmer can quickly learn a new language, and further, it shows the danger of adopting new languages generally, especially those with extremely complex and not particularly well-defined syntax and semantics, like Ruby.

I also wonder how many vulnerabilities result just from Rubyists favoring cutesy APIs (or "DSLs," as they call them) that while making for great demos, hide the often times unignorable, crucial details of what they do from their users.


Where's the attempt to submit a patch to fix the problem before disclosing?


The 1.x versions of BSON are vulnerable, too, FWIW.


I tested on our app (which uses BSON-ruby 1.9.2) and was surprised to find that the detection code indicated it was not vulnerable. Turned out it was because we also use bson_ext — bson_ext replaces the vulnerable method with a C implementation which doesn't use regexes.


Kinda funny to see a "safe" language saved by C. Just sayin'


Oh, that's a good catch. I checked on JRuby, which doesn't use bson_ext.


Added a patch for Moped::BSON in Rails here

https://gist.github.com/eddanger/9408317d5d508d8e9ba7


This has been fixed. Just update moped:

gem "moped", "~> 2.0.5"


Is this only an issue if you are defining the BSON _id ?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: