Wow! I was working on this issue in our DBMS product today! Fun suggestion, try ...

cerved · on Jan 12, 2022

Null-bytes are not valid characters in JSON strings. Nor any other control characters for that matter

lifthrasiir · on Jan 13, 2022

They are valid if escaped, as explicitly noted in the Section 7 of RFC 7159 [1]. (Annoyingly enough it doesn't explicitly say JSON strings are Unicode strings, it just says that a certain subset of JSON strings is interoperable with Unicode.) GP means that the escaped null byte can still cause issues for C interoperability.

[1] https://datatracker.ietf.org/doc/html/rfc7159#section-7

cerved · on Jan 13, 2022

Exactly, null bytes aren't allowed.

> All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character may be escaped in this way

> Any character may be escaped.

I personally find the JSON spec very explicit while succinct

lifthrasiir · on Jan 13, 2022

Oh, actually I made a mistake in the GP. The following sentence:

> Annoyingly enough it doesn't explicitly say JSON strings are Unicode strings, [...]

...is false, I completely missed the very first section (I obviously searched for "Unicode", but failed to thoroughly check results). I have other valid criticisms of the JSON specification but that is not, so please ignore that part of rants since it was based on a wrong assumption. Thank you for (implicitly) pointing it out.

cerved · on Jan 13, 2022

no worries, I recently spent a fair bit of time parsing JSON with sed so the spec is fresh in mind