Hacker News new | past | comments | ask | show | jobs | submit login

> Well, first and most obviously, if you are thinking of rolling your own JSON parser, stop and seek medical attention.

As someone who has written his own JSON parser, I must concur. Ahh - are there any doctors here...?

In my defense - I was porting a codebase to a new platform, and needed to replace the existing JSON 'parser'. You see, it was:

  - Single-platform
  - Proprietary
  - Little more than a tokenizer with idiosyncrasies and other warts
Why was it chosen in the first place? Well, it was available as part of the system on the original platform. Not that I would've made the same choice myself. We had wrappers around it - but they didn't really abstract it away in any meaningful manner. So all of it's idiosyncrasies had leaked into all the code that used the wrappers. In the interests of breaking as little existing code as possible, I wrote a bunch of unit tests, and rewrote the wrapper in terms of my own hand rolled tokenizer. Later - either after the port, or as a side project during the port to help out a coworker (I forget) - I added some saner, higher level, easier to use, less idiosyncratic interfaces - basically allowing us to deprecate the old interface and clean it up at our leisure. This basically left us with a full blown parser - and it was all my fault.

> Takeaways: Don't parse JSON yourself, and don't let calls to the parsing functions fail silently.

I'd add to this: Fuzz your formats. All of them. Even those that don't receive malicious data will receive corrupt data.

Many of the same problems also affect e.g. binary formats. And just because you've parsed valid JSON doesn't mean you're safe. I've spent a decent amount of time using e.g. SDL MiniFuzz - fixing invalid enum values, unchecked array indicies, huge array allocations causing OOMs, bad hashmap keys, the works. The OOM case is particularly nasty - you may successfully parse your entire input (because 1.9GB arrays weren't quite enough to OOM your program during parsing), and then later randomly crash anywhere else because you're not handling OOMs throughout the rest of your program. I cap the maximum my parser will allocate to some multiple of the original input, and cap the original input to "something reasonable" (1MB is massive overkill for most of my JSON API needs, for example, so I use it as a default.)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: