> 3 years ago I wrote a blog post about broken regular expressions in Ruby, ^$ meaning new lines \n.
Are they actually broken? One of the first quirks I learned writing Ruby is that you use \A and \z instead of ^ and $.
I know better than to blame security vulnerabilities on "bad programmers" rather than usability problems with a language (or plain old PoLA violations), but changing the meaning of these anchors will be a tough migration, possibly.
"Broken" is a subjective thing, but quirks that break extremely well established patterns (like how people validate the beginning and end of a line in regex) is where a lot of security issues tend to crop up.
Considering that I would venture to say that this is indeed a broken thing since it's a convention that I've only heard about in Ruby. It's a special snowflake in a component that developers use almost exclusively for validation (regex), with a pretty huge gotcha, especially because it appears to function as intended.
While I tend to agree with you, how many people have tried to downplay C's deficiencies with "good programmers following good practices don't introduce buffer overflows into their code?" Where has that gotten us?
I'm not saying that I have a solution, but I can't help be see the parallels.
Why is everybody talking about ruby's regex implementation? ^, \A, $, \Z, \z are standard across many languages, based on PCRE syntax. This isn't a ruby problem; this is simply developers who are not experts with regular expressions not fully understanding how to write patterns with the tool they are using. There are no problems with regex implementations, only problems with regex use.
Anybody interested should have a read through http://pcre.org/pcre.txt. The syntax presented here is used in perl, php, ruby, python, and many others.
Also, nobody ever uses \A without the /m flag. You use ^, it has the same meaning unless you specifically add the /m flag to allow ^ to match at the beginning of any line rather than at the front of the string only. This distinction will only bite developers who just add flags like /msig for every regex, because again they don't understand exactly what every flag actually does.
> ^, \A, $, \Z, \z are standard across many languages, based on PCRE syntax.
Uhm... No? ^$ have always meant beginning and end of string, not line,
unless you turn on a flag. Not by default. You can check PCRE's
documentation if you don't believe me. And then, there's Ruby:
Eww what the hell. I don't use ruby, and if I did I might very well - as a PCRE half-expert - fall into this trap based purely on the assumption that ruby was using PCRE. I just looked at their Regexp class, and it matches PCRE in most regards. The fact that /m makes . match newline \n is horrible - every PCRE-based implementation uses /s for that, whereas /m only affects ^ and $.
It still falls on the developer to understand the exact flavor of regex available in their language. And yet ruby is doing a disservice to anybody coming to their language with existing PCRE knowledge by having syntax that is almost an exact match to PCRE used in many languages... only to find out someday that it's not. Harsh.
Regular expressions are dangerously complex for validation code, it's too easy to overlook something or miss some context (like multiline mode, see also riffraff's comment).
So the problem is probably not developer knowledge (note that you got it wrong, too!), but rather that regexps are too hard to get right.
Sounds like a pretty successful bidirectional filter to me. (As in: you apparently don't want engineers who don't know this detail, and I wouldn't want to work at a company where that kind of trivia is a litmus test.)
It's not trivia. If someone doesn't know the difference, they're going to allow bad data into our database. Large webapps with poor model validations are security and maintenance nightmare.
Actually, I am reminded of an error that happened which was similar to this. After I left a past company, an engineer flubbed a validation which allowed a subtle bug to go undetected for 10 days which cost the company $500,000.
No, money wasn't being stolen, but the validation error meant that clients' money was being spent and not being tracked. The company had to eat the costs.
To clarify this isn't Mongo's BSON, it's Moped's implementation of BSON/Ruby's implementation of BSON (again). The title is fairly misleading making it sound like it's actually Mongo which is vulnerable. Still interesting stuff though.
The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo Mapper.
The only thing that _isn't_ vulnerable is Moped's BSON implementation (if reasonably recent), but it was dropped in Moped 2.x.
In reality if your using Mongo with Ruby, your most likely vulnerable, unless you happen to be on Moped 1.x.
> The vulnerability is in `bson-ruby`[1] which is written by MongoDB and used by Moped (and thus Mongoid), the official Ruby driver from MongoDB, and Mongo Mapper.
Then it's in the ruby gem of MongoDB's driver for ruby NOT in MongoDB. The title is still misleading for people who do not code in ruby and therefore are not vulnerable to the apparently ever present ruby BSON bug.
> Mongo BSON Injection
A better title would be Mongo gem BSON Injection
I am not trying to nit-pic I was fairly confused when seeing the title because I don't code in ruby and was 99% sure Mongo's core was C not ruby.
The article describes how this problem was present in Moped's BSON implementation, then fixed. Then later, Moped replaced its own BSON implementation with BSON-ruby, which had a version of the method which was not vulnerable. Later still, BSON-ruby's method was changed, making it vulnerable. BSON-ruby is, AFAICT, the official BSON library for ruby from Mongo.
As I understand it, the vulnerability is in any ruby application which uses a vulnerable version of the bson gem and which accepts object IDs from user input. You don't have to be using Moped.
One way to help protect this kind of issue is to be explicit about validation steps.
This is the buggy code:
!!str.match(/^[0-9a-f]{24}$/i)
That regex is trying to do three different things: validate the length is 24, validate the string contains alphanums, and ensure the matching is pinned from start to finish.
I prefer code that makes the validation steps explicit and simpler:
the [^ part negates the characters inside the []. So it'll match anything that is NOT 0-9a-z, case insensitive /i. The !~ then says that str should not match the regex. so you end up with it saying that str should not match anything other than 0-9a-z case insensitive.
Oh boy I'd love to watch this guy in action in a twitch.tv stream, taking down a site (to clarify... white hat stuff!). It would be so damn fascinating to see how his mind works and how he makes the leap from 0 to exploit.
For this kind of stuff, I'm not sure it would be that interesting. Lots of staring off into space... scrolling the source code... cursing people who can't write code that other people can understand... oh! That doesn't look right... Only the "oh!" part would be about 20 hours into the stream ;-)
I once had an idea to invite open source developers to remotely pair with me for a day on something (a bug fix, some feature they are working on, whatever)... I would record the session, do an introduction to the problem, and then edit the session down to about an hour or an hour and a half. I think it would be fascinating (if a lot of work).
I've been thinking about live coding on twitch but having to think about not revealing any sensitive information would make it a hassle. I will give it a try for some side projects.
I think you mis-parsed the response. You said "hats off" and elcct was making a joke based on that re: white hats. I don't think there was any accusation intended.
I suspect that many of these same Rubyists who mistakenly assumed ^ and $ have the normal PCRE semantics would still readily claim to know Ruby, and in many cases, even to know it well. I think this type of vulnerability undermines the optimistic belief that any decent programmer can quickly learn a new language, and further, it shows the danger of adopting new languages generally, especially those with extremely complex and not particularly well-defined syntax and semantics, like Ruby.
I also wonder how many vulnerabilities result just from Rubyists favoring cutesy APIs (or "DSLs," as they call them) that while making for great demos, hide the often times unignorable, crucial details of what they do from their users.
I tested on our app (which uses BSON-ruby 1.9.2) and was surprised to find that the detection code indicated it was not vulnerable. Turned out it was because we also use bson_ext — bson_ext replaces the vulnerable method with a C implementation which doesn't use regexes.
Are they actually broken? One of the first quirks I learned writing Ruby is that you use \A and \z instead of ^ and $.
I know better than to blame security vulnerabilities on "bad programmers" rather than usability problems with a language (or plain old PoLA violations), but changing the meaning of these anchors will be a tough migration, possibly.