Hacker News new | past | comments | ask | show | jobs | submit login
Packet Capturing MySQL with Rust (agildata.com)
102 points by gbuehler on June 3, 2016 | hide | past | favorite | 17 comments



> Enter regex macros! While it is presently slower, and requires a Rust nightly, it has the very appealing property that if your regex is not a correct expression, your program won’t compile!

To be clear, it's not just slower, it's much slower. See the benchmark comparison here: https://gist.github.com/b0f6a17744dd1df60752b6e8ced47afd <-- That's why the `regex!` macro isn't even in the docs any more.

It looks like `regex!` is the only thing preventing your project from compiling on stable Rust, right? FWIW, the Clippy lint tool will check your `Regex::new` calls at compile time for you (assuming it's a string literal, which it is in your case).

Also, I'd recommend not using `*` as a version constraint in your `Cargo.toml`. You do have a `Cargo.lock` so it's not as bad, but with better version constraints, you'll be able to run `cargo update` and get semver compatible updates.


Both good suggestions for improvement, Andrew. I saw the performance notes, and admit I was a little torn. Quite simply, the regex! macro was interesting for the reason stated, and I left it in there for the purpose of showcasing something (a little bit) unique in Rust.

Regarding the asterisk for versioning in Cargo.toml, I also agree. When quickly putting things together, I usually start with it just to see if the default version pulled works. The great utility of Cargo.lock, effectively storing the working versions of all the crates, allows scraping the versions out of there at any time, and putting them into the .toml.

I hope you noticed the extensive links in the post, as one of the goals was to bring more people into the Rust ecosystem. The Spyglass utility does work quite well. None of us claimed it has reached a state of absolute perfection, so your comments are appreciated (and pull requests will be as well)!

Thank you.


No worries! And yes, regex! is a pretty cool thing to showcase---it's a pity that it is so slow. :-( Very nice project though! :-)


Is there any fundamental reason that compile-time regular expressions couldn't be just as fast as Regex::new? They could use the same regex implementation.


One approach is to turn regex! into something like lazy_static! but with syntax checking. Since this just reuses Regex::new, I wouldn't call these "compile time regexps."

Another approach is to re-implement everything that has gone into Regex::new, but in a way that works at compile time.

Another approach is to operate more like Ragel and try to get better performance, but this will need to be compensated somehow to provide the full suite of the regex API.

(1) isn't that interesting since lazy_static! and Clippy already serve that role. That latter two approaches require a lot of work that will only be available on nightly for the foreseeable future. (And it's not even clear to me how much faster (3) could even be.)


I feel kind of uncomfortable with that regular expression for scrubbing data. It seems to be fail-open rather than fail-close and does clearly not cover the full lexical structure (e.g. hex numbers or MySQL's disgusting hex-encoded strings, including the numeric digits, are not caught by any of those cases, and thus would leak in ful. Or there's the possibility of string escaping with backslashes being turned off with a config setting, which would screw up the escape handling in the regular expression).

Am I missing some subtlety that makes it safe?


I have no doubt that there are some cases which won't match. This particular utility does not need to be 100% on for every possible corner case, to produce the desired result. That said, all improvements, whether pull request or posted suggestions, are much appreciated.

My comments regarding the regex was the very high number of cases that are correctly handled, with such a small amount of code.

Thank you for your comments. I appreciate it.


VividCortex has an agent that works similarly, which I believe they've written in Go using libpcap. It would be nice if they open-sourced it.

https://www.vividcortex.com/resources/network-analyzer-for-m...


Interesting - I hadn't seen libpnet before. I was recently working on an experiemental project doing deep packet inspection in Rust using libpcap, which doesn't have very mature Rust bindings yet - the basics work, but it's a bit rough around the edges. libpnet looks like it has a much nicer Rust interface, and does some more things for you as compared to libpcap, which gives and takes &[u8]s and nothing else.

However, libpnet doesn't have two very useful things, as far as I can see: Reading/writing packet capture files, and the ability to use BPF filters. The first in this case might be useful mainly for testing, but the latter seems like it might simplify a fair amount of their code.


I was just thinking about writing a minimal traffic-analyzer and libpnet looks way more suitable for this task than libpcap.

And adding the functionality for a pcap like fileformat doesn't seem that difficult.

The filters are a major pain point, I don't know how libpcap handles this, but at least it says it won't copy packets from kernel- to userspace that are not matching. Thus avoiding alot of overhead, maybe it's possible to introduce some rusty kind of filtering in libpnet, too.

Going to log into Github now and see if I can do something.

EDIT: fixed spelling


If you want to avoid libpnet or libpcap, you can use socket and recv directly.

Here's a quick example demonstrating socket & recv capturing all packets on all interfaces.

https://gist.github.com/fkautz/0104084fd79cee5608d8e3fc6e729...


As a very recent libpnet contributor, packet filters are on my personal wishlist. That said, I don't use them in my current project that uses libpnet, so they're definitely in the backlog, although they shouldn't be too difficult to implement.

As for reading and writing pcap files, I just use the pcap crate and use a common buffer. It's a little clumsy but it does work.


> To run Spyglass, you need extra permissions above that of a normal user in order to capture network traffic at the data-link layer, below IP, and without having to alter or interfere with the regular data flow between the client app and database servers. We recommend running it using “sudo.”

Wouldn't it be better to use some kind of privilege separation? I think there is a reason WireShark does this... And even saying Rust is a safe Language won't save you from programming errors, it just makes them more diffcult.


Thank you for the suggestion. Spyglass went from concept to a working product which met the project goals, in a little over 5 weeks.

And, you're correct it won't save you from all programming errors. It does, however, make it far more difficult to accidentally encounter whole classes of them which constitute, on average, quite a high percentage of debug time in other systems languages.


Why are you not encrypting your MySQL connections with SSL? If you're in the cloud, you absolutely should be encrypting. Even if you're in your own colos, you should be encrypting (in the chance of inter-colo queries). Seriously, why aren't you encrypting this traffic? Query intelligence isn't an valid excuse. Turn on query logs instead. Percona has shown that the logging impact is very minimal (even if the link is 7 years old now) [0].

[0] https://www.percona.com/blog/2009/02/10/impact-of-logging-on...


I hope to one day understand how a post, with 4 points only, by a newly created account, gets promoted to the front page.


Please don't post comments like this. If you're worried about voting on a story, send an email to hn@ycombinator.com and we'll look into it. (In this case, the voting looks largely legit. Rust is popular on HN these days, so that may be why.)

Oh, and nothing is wrong about posts by new accounts making the front page. We welcome new users!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: