Hacker News new | past | comments | ask | show | jobs | submit login

Any data in your database that can come from external input should be treated as untrusted and validated before it's used. Otherwise validation bugs or bypasses will result in bad data and exploits that persist beyond the fix. Edit: I’m not arguing against the need to use utf8mb4



While that’s true, I trust PostgreSQL to store exactly what I’ve asked it to store. At some point, you have to trust something to do its job, or else everything built on top of it is a castle of sand.

Imagine a bug like this in ext4. No one would reasonably contend that the layers on top of it should be validating that the files you write out are the ones you’ll read back in. We write unit tests for all kinds of stuff, but we’re not that thorough.


Agreed. The confusing part here as I see it is where validation layer A (correctly) asserts the data is valid UTF-8 and safe then assumes the database persists what it passes to it, since no error is reported.

Then, subsystem B trusts reading the database field (since it passed validation layer A).

Obviously more validation layers can be added, but at this point validation layer C called by subsystem B needs to know what the initial input from layer A is in order to differentiate it from the db value which was manipulated - a rather tricky thing to do sometimes. (I guess you could add a hash to the db to check the db is storing your strings, but really.. come on)

Upgrading to utf8mb4 is probably safer than hoping enough validation layers thrown at it solves the problem.


Yes. I’ve been burned by this exact issue in the past, not realizing that utf8 was not really utf8 in mysql-land. That is a major, major WTF IMO.

But I’ve also seen people relying on validation at the time of insertion so many times that I wanted to warn against that, too. Not in argument against the need for utf8mb4.


Sure, but the issue here is that the default mysql "utf8" encoding is not actually utf8. You can write as many validation layers as you want, but if they are assuming that utf8 actually means utf8, they won't help, and mysql will potentially screw it up when it gets stored.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: