Hacker News new | past | comments | ask | show | jobs | submit login

It's a TOCTOU bug [1], a well known category of bugs.

[1] https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use




Also one more argument for “parse don’t validate”. The code validates a mutable input, and assumes that validation holds thereafter. An incorrect assumption as it turns out.


While that is good advice, it can't apply to mutable input. Unless a defensive copy is made (at least, under the current system), concurrent modification can still occur. The new type's underlying data is still being accessed.


There are necessarily copies being made since it's converting a `char[]` to a `byte[]`, the problem is that they're not done correctly.

Currently the code tries to encode the chars, and if it fails it completely bails out and restarts with a code unit copy. The bailing and restarting is what offers the opportunity for TOCTOU.

But if instead of bailing it converted the data collected so far to code units, then appended the code unit on which it failed, then switched to a UTF16 copy loop, the result would necessarily be correct (at least insofar as a UTF16 string would not contain just latin1)

And in fact this would likely be more efficient than the current version, because we already know that everything we've already converted is valid latin-1, which means we can literally just copy that to every other byte. There is no need to re-do that validation and conversion work. Which is currently the case, because StringUTF16.toBytes redoes the entire thing from zero.


Ah, good points. Thank you for the insight!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: