Sj.h: A tiny little JSON parsing library in ~150 lines of C99

lioeters · 2025-09-21T17:26:23 1758475583

What I love about this author's work is that they're usually single-file libraries in ANSI C or Lua with focused scope, easy-to-use interface, and good documentation. And free software license. Aside from the posted project, some I like are:

- log.c - A simple logging library implemented in C99

- microui - A tiny immediate-mode UI library

- fe - A tiny, embeddable language implemented in ANSI C

- microtar - A lightweight tar library written in ANSI C

- cembed - A small utility for embedding files in a C header

- ini - A tiny ANSI C library for loading .ini config files

- json.lua - A lightweight JSON library for Lua

- lite - A lightweight text editor written in Lua

- cmixer - Portable ANSI C audio mixer for games

- uuid4 - A tiny C library for generating uuid4 strings

olivia-banks · 2025-09-22T00:36:43 1758501403

I vendor in log.c all the time for C projects! I had no idea the author was relatively prolific. Would really recommend checking out log.c, it's really easy to hack in what you need to.

rurban · 2025-09-22T06:10:26 1758521426

Ah, there is where we have log.c from. Good to know because I have plenty of tiny fixes for him.

johnisgood · 2025-09-22T10:51:09 1758538269

Speaking of, I personally use https://zolk3ri.name/cgit/libzklog/about/ because I like the way it looks. :D I used his simple logging library in Go, so might as well.

I used "lite" (text editor in Lua) which has been mentioned under this submission. It is cool, too.

01HNNWZ0MV43FF · 2025-09-21T21:43:45 1758491025

Oh yeah, I used their Lume library back when I did games in LOVE2D. I actually ran into them a couple times in the IRC chat (and told them one of their ideas was bad, sorry about that rxi, I checked and it's actually a good idea lol)

https://github.com/rxi/lume

maldonad0 · 2025-09-21T21:02:57 1758488577

It's open source, not free software.

F3nd0 · 2025-09-21T21:25:48 1758489948

‘Free software’ and ‘open source software’ (as respectively defined by the FSF [1] and the OSI [2], which is how they’re usually used in practice) have overlapping definitions. The project in question is released into the public domain via the Unlicense, which qualifies as a free software ‘licence’. Many of the other projects use the MIT/Expat licence, which also qualifies as a free software licence.

[1] https://www.gnu.org/philosophy/free-sw.html [2] https://opensource.org/osd

satvikpendem · 2025-09-21T23:13:07 1758496387

If anyone is curious on FSF's comments about various licenses: https://www.gnu.org/licenses/license-list.en.html

typpilol · 2025-09-21T22:20:26 1758493226

I also use Unlicense. It's literally the most permissive license you can have lol

HighGoldstein · 2025-09-22T11:39:16 1758541156

The caveat with the Unlicense is that it doesn't work in some jurisdictions, and the work may be considered literally unlicensed, as in nobody except the copyright owner can use it. In practical terms, of course, I doubt anyone using the Unlicense plans to come after you for copyright infringement, but it's something to keep in mind. This is why many organizations recommend instead using something like CC0, MIT etc.

rerdavies · 2025-09-21T22:50:17 1758495017

And how exactly does it not qualify as an open source license? Seems to meet the definition as far as I can see.

Cogito · 2025-09-21T23:09:26 1758496166

No claim was made that it is not open source. The contention was over if it was a free license or not:

> not free software

which it is. As F3nd0 said, it's both.

tonypapousek · 2025-09-21T21:07:49 1758488869

The license says otherwise; hard to get freer than public domain.

captbaritone · 2025-09-21T21:21:41 1758489701

I recall hearing that SQLite actually had some significant issues with choosing public domain as their license and somewhat regret the decision. Apparently it’s not a concept which has broad understating internationally, and there’s less legal precedent in a software context which has made it harder for some teams to adopt due to concerns from legal departments.

shiomiru · 2025-09-21T23:11:21 1758496281

The Unlicense isn't "just" public domain though, it also has a fallback clause that explicitly lists things you are allowed to do ("copy, modify, publish, use, compile, sell, or distribute"). So I think the intent is, even if PD isn't recognized and line 1 is invalid, you're still granting a license to the same effect.

SQLite on the other hand just says

    The author disclaims copyright to this source code.  In place of a legal
    notice, here is a blessing:

      May you do good and not evil.
      May you find forgiveness for yourself and forgive others.
      May you share freely, never taking more than you give.

which seems less useful once you strike sentence 1.

SoKamil · 2025-09-21T21:18:37 1758489517

What is the stance of Your Average Corp’s security department on public domain software? Do they accept software under such licensing (or lack thereof)?

tonypapousek · 2025-09-21T22:20:50 1758493250

From an American perspective, there’s no mechanical difference between that and the MIT license when it comes to security.

They care more about the package being maintained, bug-free, and their preferred vulnerability database showing no active exploits.

At least in my experience, anyway. Other companies may have stricter requirements.

jen20 · 2025-09-21T21:38:22 1758490702

Who cares? Seriously. Whether a commercial entity who wants to be able to benefit from your work accepts the license you choose for work you do is as much a concern as whether or not the prime minister of Liechtenstein accepts the color you paint the outside of your house in the USA. That is: none.

_puk · 2025-09-21T21:51:37 1758491497

Bad analogy.. if they truly care what colour your house is then there's plenty of strings they could pull. I mean, a good number of large U.S. company's tax and corporate structures depend heavily on Liechtenstein's government’s rules..

jen20 · 2025-09-22T01:11:14 1758503474

Some people have standing for better or mostly worse - HOAs and local councils. The government of Liechtenstein does not.

rerdavies · 2025-09-22T00:50:19 1758502219

Kinda depends on whether you're publishing open source software so that people can use it. And if you're not publishing open source software so that people can use it, why exactly are you doing it? If you don't want people to use it, GPL is the way to go. If you do want people to use it, MIT or BSD is a much better way to go.

zelphirkalt · 2025-09-22T09:21:51 1758532911

As a counterexample: I would rather use GPL or AGPL licensed code on my machine, than merely MIT licensed code, because I see the philosophical difference behind it, due to copyleft. Someone who makes some code available under (A)GPL wants it to stay available under a free software license. Someone who releases under MIT is either uninformed, or has different motivation , that does not fully align with keeping things libre for people. It is less safe against being made proprietary in the future. Anyone can come and make a new version that is proprietary and has that one more feature, luring people into using the proprietary version instead of the open source one.

So I have much more trust in (A)GPL licensed projects, and I see them as more for the people than MIT licensed projects.

jen20 · 2025-09-22T01:10:09 1758503409

Linux, Git and the entire GNU system are counterexamples. Meanwhile FreeBSD dies by the day.

People != the legal departments of corporations.

xigoi · 2025-09-22T04:17:30 1758514650

GPL is for when you want people to use it. MIT is for when you want megacorporations to turn it into enshittified proprietary software and profit off of it without giving back to you.

rerdavies · 2025-09-22T16:31:51 1758558711

Sure. Why not?

TZubiri · 2025-09-22T04:34:16 1758515656

>"If you don't want people to use it, GPL is the way to go"

lol

tripplyons · 2025-09-21T21:17:27 1758489447

Open source is a more informative term for this than free software. Not all free software is open source, but all open source software is free.

Edit: I was not aware of the FSF's definition. I was using a definition of free software being software that you can use without having to pay for it.

F3nd0 · 2025-09-21T21:31:22 1758490282

I think you are mistaken; neither is a subset of the other. At the very least, there are licences which are recognised as open source by the OSI, but not as free by the FSF, and vice versa [1]. I think it’s more appropriate to say they are two fundamentally separate definitions with a massive overlap.

[1] https://spdx.org/licenses/

tripplyons · 2025-09-21T22:22:57 1758493377

Thank you for the information! I was not aware of the FSF's definition.

TZubiri · 2025-09-22T04:36:34 1758515794

You have recited a successful incantation to summon the Stallman acolytes.

To add an additional suggestion, gratis can also be used to refer to free as in free beer. Comes from a latin root and is common in spanish speaking countries to refer only to free of charge, and not as in freedom.

xigoi · 2025-09-22T04:20:19 1758514819

> Edit: I was not aware of the FSF's definition. I was using a definition of free software being software that you can use without having to pay for it.

That’s called freeware. Also, open-source software can be paid (with the caveat that if someone buys it, you must allow them to redistribute it for free).

manbash · 2025-09-21T21:31:16 1758490276

> Not all free software is open sourc

Depends on which "free software" definition you're referring to.

The FSF definition of "free software" requires it to be open source.

tripplyons · 2025-09-21T22:25:06 1758493506

I have clarified which definition I used.

lioeters · 2025-09-21T21:19:50 1758489590

Aside from the posted library sj.h which is in public domain (compatible with the definition of "free software"), the author's other projects mostly use the MIT license.

The MIT license upholds the four essential freedoms of free software: the right to run, copy, distribute, study, change and improve the software.

It is listed under "Expat License" in the list of GPL-compatible Free Software licenses.

https://www.gnu.org/licenses/license-list.html

ramses0 · 2025-09-21T21:35:56 1758490556

"Source Available" and "Open Source" (with an OSI-approved license) are the terms you're looking for. "Free as in speech, or free as in beer?" is your rallying cry.

rerdavies · 2025-09-22T00:54:51 1758502491

Or Free as in Ebola, in the case of GPL-licensed software. Whatever happened to Free as in Air and Sunshine?

a96 · 2025-09-22T06:42:29 1758523349

It was enshittified because there was nothing defending it.

rerdavies · 2025-09-22T16:24:39 1758558279

From what? It's pretty difficult to enshittify something that has an MIT license; whereas there seem to be practically infinite ways to enshittify GPL software.

layer8 · 2025-09-21T17:54:52 1758477292

The library doesn’t check for signed integer overflow here:

https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...

Certain inputs can therefore trigger UB.

hypeatei · 2025-09-21T18:38:14 1758479894

You're not aware of the simplistic, single header C library culture that some developers like to partake in. Tsoding (a streamer) is a prime example of someone who likes developing/using these types of libraries. They acknowledge that these things aren't focused on "security" or "features" and that's okay. Not everything is a super serious business project exposed to thousands of paying customers.

layer8 · 2025-09-21T18:44:53 1758480293

Hobby projects that prove useful have a tendency of starting to be used in production code, and then turning into CVEs down the road.

If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.

hypeatei · 2025-09-21T18:47:22 1758480442

> Hobby projects that prove useful have a tendency of starting to be used in production code

Even if that is true, how is that the authors problem? The license clearly states that they're not responsible for damages. If you were developing such a serious project then you need the appropriate vetting process and/or support contracts for your dependencies.

layer8 · 2025-09-21T18:50:56 1758480656

I didn’t say it’s the author’s problem. It’s a problem with the code.

ethanwillis · 2025-09-21T21:01:14 1758488474

Why play all these semantic games? You're saying it's the author's problem. You want them to even edit their readme to include warnings for would be production/business users who don't want to pay for it.

layer8 · 2025-09-21T23:30:03 1758497403

GP is arguing about licences. Yes, formally there is no obligation, and I'm not saying the author has any such obligation.

In the present case, either the missing overflow check in the code is by mistake, and then it's warranted to point out the error, or, as I understood GGGP to be arguing, the author deliberately decided to neglect safety or correctness, and then in my opinion you can't reject the criticism as unwarranted if the project's presentation isn't explicit about that.

I'm not making anything the author's problem here. Rather, I'm defending my criticism of the code, and am giving arguments as to why it is generally good form to make it explicit if a project doesn't care about the code being safe and correct.

stevepotter · 2025-09-22T11:07:40 1758539260

I understand your point and if I were the author I would want either a disclaimer or a fix. File an issue or make a pr. Filing an issue is quicker and more fruitful than dealing with folks here

madeofpalk · 2025-09-22T13:59:39 1758549579

It is useful to understand the limitations of such hobby programs to know what they are useful for.

president_zippy · 2025-09-22T02:13:56 1758507236

[flagged]

lixtra · 2025-09-22T04:57:03 1758517023

Layer8 DID the thing though, skimmed through the code and thought about security issues.

koolba · 2025-09-21T19:30:27 1758483027

> If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.

What do you consider this clause in the LICENSE:

>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

CorrectHorseBat · 2025-09-21T19:37:48 1758483468

A standard clause you can find in every open source license? It doesn't say anything about how serious the project takes security

Yeask · 2025-09-22T01:52:57 1758505977

You write only Rust code don't you?

CorrectHorseBat · 2025-09-22T05:19:56 1758518396

I wish ;) You're talking about how Rust code usually uses the MIT license and this is a part of the MIT license?

Every open source license has a very similar clause, include but not limited to BSD, GPL, CDDL, MPL and Apache.

vrighter · 2025-09-21T18:48:40 1758480520

then that is their problem, not the code author's. If you use a hobby project in production, that's on you

w4rh4wk5 · 2025-09-22T08:47:06 1758530826

When such a library is used in production code, that's on the person who chose to use it in production, not on the original author of the library.

You are responsible for the code you ship, doesn't matter whether it's written by you, an LLM, or whether it's a third-party dependency.

zelphirkalt · 2025-09-22T09:33:35 1758533615

While that is certainly true, we could also be nice and reduce the workload of someone reviewing their dependencies and write it down in the readme.

f1shy · 2025-09-21T19:26:01 1758482761

My personal take is: if the code is good enough, it should be trivial to switch to a better library at the point when needed.

taminka · 2025-09-21T23:46:37 1758498397

> They acknowledge that these things aren't focused on "security" or "features" and that's okay.

where? single header is just a way to package software, it has no relation to features, security or anything such...

TZubiri · 2025-09-22T04:39:00 1758515940

Either you are :

- overestimating the gravity of a UB and its security implications

- underestimate the value of a 150 line json parser

- or overestimate the feasibility of having both a short and high quality parser.

It sometimes happens that fixing a bug is quicker than defending the low quality. Not everything is a tradeoff.

nurettin · 2025-09-22T10:03:51 1758535431

I have tsoding fatigue. Took a long time to get him out of the main page. I like the DIY attitude, but it gets old really fast.

bmn__ · 2025-09-22T11:26:50 1758540410

https://news.ycombinator.com/item?id=44345740

No one cares. Stop complaining or GTFO.

nurettin · 2025-09-22T11:47:02 1758541622

Au contraire, I think people do care. Now I will continue complaining and raising awareness with renewed fervor.

zwnow · 2025-09-21T18:41:50 1758480110

So if its a hobby project designed for just a handful of people, its suddenly okay to endanger them due to being sloppy?

hypeatei · 2025-09-21T18:44:04 1758480244

This is an open source project that you're not obligated to use nor did you pay for it. Who is it endangering?

The license also makes it clear that the authors aren't liable for any damages.

account42 · 2025-09-23T13:59:58 1758635998

> The license also makes it clear that the authors aren't liable for any damages.

The license disclaims liability but that doesn't mean the author cannot ever be held liable. Ultimately, who is liable is up to a court to decide.

flykespice · 2025-09-21T18:52:39 1758480759

...and what open source software license in the world makes the author liable for damages?

Yeask · 2025-09-22T01:58:05 1758506285

None. That is how RedHat makes money.

k_roy · 2025-09-21T19:30:17 1758483017

Probably more of lack of explicit liability in the license.

flykespice · 2025-09-22T14:37:39 1758551859

Pretty sure the all caps text on the bottom of most open source licenses out there makes it clear

virtue3 · 2025-09-21T20:45:03 1758487503

every OSS license I've ever seen is "use at your own risk" essentially. That's how this whole system works.

You find a vulnerability? patch it, push change to repo maintainer.

https://xkcd.com/2347

nkrisc · 2025-09-21T19:13:14 1758481994

The code nor author don’t endanger anyone. Whoever uses it inappropriately endangers themselves or others.

Why are you using random, unvetted and unaudited code where safety is important?

Yeask · 2025-09-22T01:55:51 1758506151

Open Source is about sharing knowledge.

They are sharing their knowledge about how to create a tiny JSON parser. Where is the problem again?

zwnow · 2025-09-22T06:12:55 1758521575

Refer to the original comment. Seems like you are incapable of connecting the comment chain.

Yeask · 2025-09-22T11:38:40 1758541120

Have some manners please.

tossaway0 · 2025-09-22T03:19:43 1758511183

Yes, pretty much. It has enough of a warning.

skydhash · 2025-09-21T18:08:09 1758478089

There was a nice article [0] about bloated edge cases libraries (discussion [1]).

Sometimes, it's just not the responsibility of the library. Trying to handle every possible errors is a quick way to complexity.

[0]: https://43081j.com/2025/09/bloat-of-edge-case-libraries

[1]: https://news.ycombinator.com/item?id=45319399

klysm · 2025-09-21T18:33:52 1758479632

Strongly disagree here because JSON can come from untrusted sources and this has security implications. It's not the same kind of problem that the bloat article discusses where you just have bad contracts on interfaces.

leptons · 2025-09-21T19:57:54 1758484674

JSON does not necessarily come from untrusted sources if you control the entire system. Not everything needs to be absolutely 100% secure so long as you control the system. If you are opening the system to the public, then sure, you should strive for security, but that isn't always necessary in projects that are not processing public input.

Here's an example - I once coded a limited JSON parser in assembly language. I did not attempt to make it secure in any way. The purpose was to parse control messages sent over a serial port connection to an embedded CPU that controlled a small motor to rotate a camera and snap a photo. There was simply no way for any "untrusted" JSON to enter the system. It worked perfectly and nothing could ever be compromised by having a very simple JSON parser in the embedded device controlling the motor.

interstice · 2025-09-21T20:09:41 1758485381

Massively agree. Remember this thinking being everywhere with databases back in the day, not every text field is hooked up to a Wordpress comment section.

boramalper · 2025-09-21T20:09:10 1758485350

Untrusted doesn’t always mean adversarial IMO, even a bitrot can invalidate your entire input and possibly also trigger undefined behaviour if you aren’t prepared to handle that.

leptons · 2025-09-21T21:15:32 1758489332

I was using a checksum to protect against "bitrot" since this was over a very noisy serial transmission line (over a slip ring). So, no, there was no "undefined behavior" and it's quite easy to avoid.

cwmoore · 2025-09-21T20:19:21 1758485961

UB = "undefined behavior", thanks

johnisgood · 2025-09-22T08:38:51 1758530331

I agree. I knew that the JSON is not going to change, so I wrote a 10 lines long parser for it. It is not a JSON parser by any means, but it parses properly what I need it to.

cozzyd · 2025-09-22T14:08:40 1758550120

sscanf as a parser definitely has its uses

johnisgood · 2025-09-22T14:15:21 1758550521

I used strchr() in a function I named get_pair(). So, all in all, I used strchr() and strcmp() only!

ghurtado · 2025-09-21T21:27:28 1758490048

> Not everything needs to be absolutely 100% secure so long as you control the system.

Isn't that a bit like saying "you don't have to worry about home security as long as you are the only person who has the ability to enter your house"?

ncruces · 2025-09-22T06:43:36 1758523416

Sure. I don't password protect my (Android) TV like I password protect my (Android) phone, despite both of them allowing authorized access to the same Google accounts, because if someone entered my house I have bigger things to worry than them using my TV.

Yeask · 2025-09-22T01:47:08 1758505628

Not at all.

Mogzol · 2025-09-22T03:13:36 1758510816

I mean yeah if you're truly the only person that has the ability to enter your house then why should you worry about home security? Nobody else has the ability to get in.

userbinator · 2025-09-21T20:09:09 1758485349

You probably didn't control the other end, as otherwise you would've used something more sane than JSON?

leptons · 2025-09-21T21:18:33 1758489513

I controlled both ends. There is nothing "insane" about JSON. It's used far and wide for many purposes. The system sending the JSON was based on Nodejs, so it was pretty natural to use JSON. And I did it with JSON just because I wanted to. I'd have had to invent some other protocol to do it anyway, and I didn't feel like reinventing the wheel when it was quite simple to write a basic JSON parser in assembly language, which is what I am comfortable with on the embedded system (been coding assembly for 40 years).

userbinator · 2025-09-21T22:10:43 1758492643

For something that simple I'd choose a custom binary protocol or something like ASN.1 instead of JSON. It's easier to generate from a HLL and parse in a LLL (I've also been writing Asm for a few decades...)

leptons · 2025-09-22T05:06:39 1758517599

I've done plenty of custom binary protocols before. I can't say they were any better or easier to deal with. I also can't say that the "parser" for a binary format was any easier than a simple, limited JSON parser.

For this specific project I chose JSON and it worked perfectly. Sending JSON from the embedded CPU was also really simple. Yes, there was a little overhead on a slow connection, but I wasn't getting anywhere near saturation. I think it was 9600 bps max on a noisy connection with checksums. If even 10% of the JSON "packets" got through it was still plenty for the system to run.

johnisgood · 2025-09-22T08:40:44 1758530444

I would love to use ASN.1 if other programming languages would match up to Erlang's ASN.1. :(

Yeask · 2025-09-22T15:25:04 1758554704

Even if is not popular here .NET does support ASN.1, not sure if at the same level as Erlang.

cozzyd · 2025-09-22T15:54:48 1758556488

It depends on the use case. JSON has a lot of tooling making it convenient in a lot of cases

Brian_K_White · 2025-09-21T20:09:21 1758485361

Public facing interfaces are their own special thing, regardless if json or anything else, and not all data is a public facing interface.

If you need it, then you need it. But if you don't need it, then you don't need it. There is a non-trivial value in the smallness and simplicity, and a non-trivial cost in trying to handle infinity problems when you don't have infinity use-case.

klysm · 2025-09-21T21:16:39 1758489399

This is a serialization library. The entire point is to communicate with data that's coming from out of process. It should be safe by default especially if it's adding a quick check to avoid overflow and undefined behavior.

Brian_K_White · 2025-09-21T22:28:30 1758493710

Incorrect assumption.

If you are reading data from a file or stream that only you yourself wrote some other time, then it's true that data could possibly have been corrupted or something, but it's not true that it's automatically worth worrying about enough to justify making the code and thus it's bug surface larger.

How likely is the problem, how bad are the consequences if the problem happens, how many edge cases could possibly exist, how much code does it take to handle them all? None of these are questions you or anyone else can say about anyone else's project ahead of time.

If the full featured parser is too big, then the line drawing the scope of the lightweight parser has to go somewhere, and so of course there will be things on the other side of that line no matter where it is except all the way back at full-featured-parser.

"just this one little check" is not automatially reasonable, because that check isn't automatically more impoprtant than any other, and they are all "just one little checks"s. The one little check would perevent what? Maybe a problem that never happens or doesn't hurt when it does happen. A value might be misinerpreted? So what? Let it. Maybe it makes more sense to handle that in the application code the one place it might matter. If it will matter so much, then maybe the application needs the full fat library.

Yeask · 2025-09-22T01:44:59 1758505499

You would use this for parsing data you know is safe.

Using a "tiny library" for parsing untrusted data is where the mistake is. Not in OP code.

president_zippy · 2025-09-22T02:27:06 1758508026

It's too bad this header-only JSON library doesn't meet your requirements. How much did you pay for your license to use it? I'm sure the author will be happy to either ship security fixes or give you a refund. You should reach out to him and request support.

account42 · 2025-09-23T14:04:52 1758636292

There is only a problem with these checks here if you pass along arbitrarily large JSON strings as all of these counters are advanced at most once per input byte. If you don't limit the input to reasonable sizes you have a potentional denial of service problem even without the UB so you should be checking for reasonable sizes which depend on your application but are likely much lower than the 2^31-1 bytes the library can safely parse.

layer8 · 2025-09-21T18:17:18 1758478638

The problem in the present case is that the caller is not made aware of the limitation, so can’t be expected to prevent passing unsupported input, and has no way to handle the overflow case after the fact.

skydhash · 2025-09-21T18:25:14 1758479114

Do you not review libraries you add to your project? A quick scan of the issues page if it's on a forge? Or just reading through the code if it's small enough (or select functions)?

Code is the ultimate specification. I don't trust the docs if the behavior is different from what it's saying (or more often fails to mention). And anything that deals with recursive structures (or looping without a clear counter and checks) is my one of the first candidate for checks.

> has no way to handle the overflow case after the fact.

Fork/Vendor the code and add your assertions.

layer8 · 2025-09-21T18:27:13 1758479233

Obviously I just did review it, and my conclusion was to not use that code.

In the spirit of the article you linked, I’d rather write my own version.

jama211 · 2025-09-21T19:28:38 1758482918

If it has limitations they should be documented though right? especially if they’re security concerns.

knowitnone2 · 2025-09-22T01:57:18 1758506238

If you review libraries, why do you need to quick scan the issues? You would have already identified all the issues right? Right?

jama211 · 2025-09-22T18:32:07 1758565927

Hahaha well said

FooBarBizBazz · 2025-09-21T19:27:59 1758482879

This might be the right attitude for a max function written in JavaScript, where the calling code has some control over the inputs.

It's the wrong attitude for a JSON parser written in C, unless you like to get owned.

account42 · 2025-09-23T14:15:17 1758636917

It's entirely reasonable for a tiny JSON parser written in C to not deal with JSON files over 0x7fffffff bytes long.

flykespice · 2025-09-21T18:44:45 1758480285

There is no easy way out when you're working with C: either you handle all possible UB cases with exhaustive checks, or you move on to another language.

(TIP: choose the latter)

jeroenhd · 2025-09-21T19:56:43 1758484603

Very few programming languages default to checked increments. Most Rust or Java programmers would make the same mistake.

Writing a function to do a checked addition like in other languages isn't exactly difficult, either.

koito17 · 2025-09-21T22:20:08 1758493208

> Most Rust or Java programmers would make the same mistake.

Detecting these mistakes in Rust is not too difficult. In debug builds, integer overflow triggers a panic[1]. Additionally, clippy (the official linter of Rust), has a rule[2] to detect this mistake.

[1] https://doc.rust-lang.org/book/ch03-02-data-types.html#integ...

[2] https://rust-lang.github.io/rust-clippy/master/index.html#ar...

account42 · 2025-09-23T14:17:38 1758637058

You can also just use ubsan if you want the runtime checks for overflows in C/C++, no need to switch languages for this.

paulddraper · 2025-09-21T20:37:05 1758487025

Yes but those languages have defined overflow.

cozzyd · 2025-09-21T21:28:48 1758490128

-fwrapv

zelphirkalt · 2025-09-22T09:43:26 1758534206

Many other languages automatically switch to a big integer number type, or have arbitrary size integers anyway.

account42 · 2025-09-23T14:18:39 1758637119

Which is a great way to turn an overflow into a denial of service as suddenly algorithms that were optimized for simple integer arithmetic slow to a crawl.

uecker · 2025-09-21T21:42:39 1758490959

For signed overflow you can just turn on the sanitizer in trapping mode. Exhaustive checks is also not that terrible.

paulddraper · 2025-09-21T20:27:00 1758486420

The amount of untrusted JSON I parse is very high.

UB is bad.

RossBencina · 2025-09-22T01:59:18 1758506358

> Sometimes, it's just not the responsibility of the library.

Sometimes. In this case, where the library is a parser that is written in C. I think it is reasonable to expect the library to handle all possible inputs. Even corner cases like this which are unlikely to be encountered in common practice. This is not "bloat" it is correctness.

In C, this kind of bug is capable of being exploited. Sure, many users of this lib won't be using it in exposed cases, but sooner or later the lib will end up in some widely-used internet-facing codebase.

As others have said, the fix could be as simple as bailing once the input size exceeds 1GB. Or it could be fine-grained. Either-way the fix would not "bloat" the codebase.

And yes, I'm well aware of the single-file C library movement. I am a fan.

Someone · 2025-09-22T11:20:09 1758540009

I wouldn’t rate those as very serious issues for this project. They’ll only be triggered if there are over MAX_INT lines or depth levels in the input. Yes, an attacker might be able to do that, but you’d have to put that input in a memory buffer to call this code. On many smaller systems, that will OOM.

Skimming the code, they also are loose in parsing incorrect json, it seems:

    static bool sj__is_number_cont(char c) {
        return (c >= '0' && c <= '9')
            ||  c == 'e' || c == 'E' || c == '.' || c == '-' || c == '+';
    }

    case '-': case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
        res.type = SJ_NUMBER;
        while (r->cur != r->end && sj__is_number_cont(*r->cur)) { r->cur++; }
        break;

that seems to imply it treats “00.-E.e-8..7-E7E12” as a valid json number.

    case '}': case ']':
        res.type = SJ_END;
        if (--r->depth < 0) {
            r->error = (*r->cur == '}') ? "stray '}'" : "stray ']'";
            goto top;
        }
        r->cur++;
        break;

I think that means the code finds [1,2} a valid array and {"foo": 42] a valid struct (maybe, it even is happy with [1,2,"foo":42})

Those, to me, seem a more likely attack vector. The example code, for example, calls atoi on something parsed by the first piece of code.

⇒ I only would use this for parsing json config files.

Being tiny is one thing, but the json grammar isn’t that complex. They could easily do a better job at this without adding zillions of lines of code.

habibur · 2025-09-21T22:47:18 1758494838

Will trigger UB if level depth is > 2 billion or in the 2nd case number of lines > 2 billion.

Limit you JS input to 1 GB. I will have more problems in other portions of the stack if I start to receive a 2 GB JSON file over the web.

And if I still want to make it work for > 2GB, I would change all int in the source to 64 bits. Will still crash if input is > 2^64.

What I won't ever do in my code is check for int overflow.

zelphirkalt · 2025-09-22T09:48:27 1758534507

Crashing without a proper error message, leaving the user wondering what happened, is a table stake in C projects, of course. How do you intend to determine the cause of your crashes and write a meaningful error message for the user, in case of too long input when you don't check overflow?

jcalvinowens · 2025-09-21T23:18:30 1758496710

> What I won't ever do in my code is check for int overflow

Amen. Just build with -fno-strict-overflow, my hot take is that should be the default on Linux anyway.

oguz-ismail · 2025-09-21T20:14:04 1758485644

    diff --git a/sj.h b/sj.h
    index 60bea9e..25f6438 100644
    --- a/sj.h
    +++ b/sj.h
    @@ -85,6 +85,7 @@ top:
             return res;
     
         case '{': case '[':
    +        if (r->depth > 999) { r->error = "can't go deeper"; goto top; }
             res.type = (*r->cur == '{') ? SJ_OBJECT : SJ_ARRAY;
             res.depth = ++r->depth;
             r->cur++;

There, fixed it

meindnoch · 2025-09-21T22:45:24 1758494724

I can only hope this was made by an LLM and not a real human.

ricardobeat · 2025-09-21T18:16:50 1758478610

An int will be 32 bits on any non-ancient platform, so this means, for each of those lines:

- a JSON file with nested values exceeding 2 billion depth

- a file with more than 2 billion lines

- a line with more than 2 billion characters

fizzynut · 2025-09-21T19:29:18 1758482958

The depth is 32 bit, not the index into the file.

If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.

You have 4GB of "padding"...at minimum.

You file is going to be Petabytes in size for this to make any sense.

You are using a terrible format for whatever you are doing.

You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.

Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.

All that is going to happen in this program is a crash.

I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.

account42 · 2025-09-23T14:23:41 1758637421

> I appreciate that people want to have some pointless if(depth > 0) check everywhere

An after the fact check would be the wrong way to deal with UB, you'd need to check for < INT_MAX before the increment in order to avoid it.

ranger_danger · 2025-09-21T19:29:02 1758482942

What is your definition of non-ancient? There are still embedded systems being produced today that don't have 32-bit integers.

account42 · 2025-09-23T14:24:57 1758637497

And those will need careful review of any code you want to run on them because no one cares about your weird architecture nor should they have to.

ranger_danger · 2025-09-23T14:46:50 1758638810

I wouldn't call 8 or 16-bit microcontrollers (with no concept of a 32-bit int) that are in billions of devices "weird". But ok.

klysm · 2025-09-21T18:34:39 1758479679

2 billion characters seems fairly plausible to hit in the real world

ricardobeat · 2025-09-21T20:22:07 1758486127

In a single line. Still not impossible, but people handling that amount of data will likely not have “header only and <150 lines” as a strong criteria for choosing their JSON parsing library.

naasking · 2025-09-21T19:14:21 1758482061

2GB in a single JSON file is definitely an outlier. A simple caveat when using this header could suffice: ensure inputs are less than 2GB.

layer8 · 2025-09-21T19:28:31 1758482911

Less than INT_MAX, more accurately. But since the library contains a check when decreasing the counter, it might as well have a check when increasing the counter (and line/column numbers).

EasyMark · 2025-09-21T19:25:23 1758482723

Or fork and make a few modifications to handle it? I have to admit I haven't looked at the code to see if this particular code would allow for that.

jeroenhd · 2025-09-21T20:07:53 1758485273

I've seen much bigger, though technically that wasn't valid json, but rather structured logging with JSON on each line. On the other hand, I've seen exported JSON files that could grow to such sizes without doing anything weird, just nothing exceeding a couple hundred megabytes because I didn't use the software for long enough.

Restricting the input to a reasonable size is an easy workaround for sure, but this limitation isn't indicated everywhere, so anyone deciding to consume this random project into their important code wouldn't know to defend against such situation.

In a web server scenario, 2GiB of { (which would trigger two overflows) in a compressed request would require a couple hundred kilobytes to two megabytes, depending on how old your server software is.

account42 · 2025-09-23T14:28:56 1758637736

To be fair, anyone who uses a 150 line library without bothering to read it deserves what they get.

And in the spirit of your profile text I'm quite glad for such landmines being out there to trip up those that do blindly ingest all code they can find.

maleldil · 2025-09-21T20:24:42 1758486282

Not really. I deal with this everyday. If the library has a limit on the input size, it should mention this.

johnisgood · 2025-09-22T08:43:20 1758530600

It is ~150 lines of code. Submit a PR, or when you git clone it add your checks, or stop complaining because the author does not owe you anything.

naasking · 2025-09-22T00:12:37 1758499957

If you deal with this every day, you're an outlier.

xigoi · 2025-09-22T04:22:01 1758514921

For such big data, you should definitely be using an efficient format, not JSON.

klysm · 2025-09-22T15:48:19 1758556099

I agree, but 2GB json files absolutely exist. It fits in ram easily

layer8 · 2025-09-21T18:18:26 1758478706

All very possible on modern platforms.

Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.

johnisgood · 2025-09-21T18:39:39 1758479979

Personally, all my C code is written with SEI C Coding Standard in mind.

pgen · 2025-09-22T07:47:01 1758527221

The author has kindly provided you with simple, readable, and free code. If you find it incomplete or unsafe, you can always modify it and contribute your changes if you wish to improve it, in accordance with the licence; and thank him while you're at it.

odie5533 · 2025-09-21T18:33:59 1758479639

Can't use this library in production that's for sure.

robmccoll · 2025-09-21T23:36:24 1758497784

Could just change the input len to an int instead of size_t. Not technically the correct type, but it would make it clear to the user that the input can't be greater than 2^31 in length.

EmilStenstrom · 2025-09-21T18:06:39 1758477999

Submit a PR!

uncircle · 2025-09-22T09:03:37 1758531817

You don't get Hacker News karma if you quietly fix a bug instead of complaining about it, though.

modeless · 2025-09-21T23:04:44 1758495884

I wouldn't expect a library like this to be secure. If you want it to be memory safe, compile it with Fil-C.

layer8 · 2025-09-21T23:09:44 1758496184

This has nothing to do with memory safety.

modeless · 2025-09-21T23:16:19 1758496579

This is an overstatement. Yes, UB does not necessarily cause a violation of memory safety, but triggering UB alone is not the goal of an attacker. UB is a means to an end and the end is usually a violation of memory safety leading to arbitrary code execution.

layer8 · 2025-09-22T00:30:13 1758501013

The primary point was that the code doesn't ensure correct processing (or returning an appropriate error) for all JSON. Even if behavior is defined by the C implementation, the overflow can lead to parser mismatch vulnerabilites, if nothing else. There are likely other "defined" failure modes the overflow can enable here.

UB was a secondary observation, but it also can lead to logic errors in that vein, without involving memory safety.

I'm not sure I agree that UB usually leads to memory safety violations, but in any case, the fact that signed integer overflow is UB isn't what makes the code incorrect and unsafe in the first place.

account42 · 2025-09-23T13:53:39 1758635619

Easy enough to fix:

  -sj_Reader sj_reader(char *data, size_t len) {
  +sj_Reader sj_reader(char *data, int len) {

Not everyone needs to waste cycles on supporting JSON files larger than 2^31-1.

Gibbon1 · 2025-09-22T06:17:05 1758521825

I've been tending to use ssize_t for indexes instead of int. Part of the reason was reading someones decent argument that

   for(int i=0; blah blah; i++)

Is actually broken and dangerous on 64 bit machines.

oguz-ismail · 2025-09-22T07:22:22 1758525742

How is ssize_t any better? It's not part of standard C and is only guaranteed to be capable of holding values between -1 and SSIZE_MAX (minimum 32767, no relation to SIZE_MAX).

mscrnt · 2025-09-22T03:07:59 1758510479

cut a PR to improve it; that would be nice

LiamPowell · 2025-09-21T21:05:22 1758488722

This is rather lenient. There's not anything wrong with that (although perhaps it should be noted for people that will use it without looking at the code), but it's the main reason this can be so small. Using their demo in the readme:

    {"x",10eee"y"22:5,{[:::,,}]"w"7"h"33
    rect: { 10, 22, 7, 33 }

hackernewds · 2025-09-21T23:15:39 1758496539

so it is wrong?

hacker_homie · 2025-09-22T00:04:14 1758499454

Parser, implies the input is assumed to be valid, validating is a whole other problem not covered by this library.

I don’t know what else you call a library that just extracts data.

codr7 · 2025-09-21T17:36:20 1758476180

JSON parser libraries in general is a black hole of suffering imo.

They're either written with a different use case in mind, or a complex mess of abstractions; often both.

It's not a very difficult problem to solve if you only write exactly what you need for your specific use case.

mbac32768 · 2025-09-21T18:32:05 1758479525

It's astonishing how involved a fucking modern JSON library becomes.

The once "very simple" C++ single-header JSON library by nlohmann is now

* 13 years old

* is still actively merging PRs (last one 5 hours ago)

* has 122 __million__ unit tests

Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.

Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.

kstenerud · 2025-09-22T05:17:34 1758518254

I did write one, but I needed to because the already-written data must be recoverable on a crash (to be able to recover partially written files) since this is in a crash reporter - and also the encoder needs to be async-safe.

https://github.com/kstenerud/KSCrash/blob/master/Sources/KSC...

And yeah, writing a JSON codec sucks.

So I'm in the process of replacing it with a BONJSON codec, which has the same capabilities, is still async-safe and crash resilient, and is 35x faster with less code.

https://github.com/kstenerud/ksbonjson/blob/main/library/src...

codr7 · 2025-09-21T22:08:02 1758492482

Yeah, but as long as I'm not releasing in public, I don't need to support 20 different ways of parsing.

That's the thing with reinventing wheels, a wheel that fits every possible vehicle and runs well in any possible terrain is very difficult to build. But when you know exactly what you need it's a different story.

vovavili · 2025-09-21T22:35:07 1758494107

I am very surprised to hear the unit testing statistic. What kind of unholy edge cases would JSON parsing require to make it necessary to cover 122 million variations?

kstenerud · 2025-09-22T05:22:04 1758518524

The more speed optimizations you put in, the gnarlier the new edge cases that pop up.

modeless · 2025-09-21T23:06:41 1758496001

This may say more about C++ than JSON

0x000xca0xfe · 2025-09-22T00:02:46 1758499366

The best language to handle unusual JSON correctly would probably be Python. It has arbitrary size integers, mpmath for arbitrary precision floats and good Unicode support.

zelphirkalt · 2025-09-22T09:56:17 1758534977

Many of the problems disappear when performance is not critical, because that opens up the options for many much nicer, much safer, and simpler languages and C/C++, to write a correct parser in.

typpilol · 2025-09-21T19:39:02 1758483542

122 million unit tests? What?

flohofwoe · 2025-09-22T09:25:10 1758533110

Most people don't need the remaining 10% but value a small and easy to maintain codebase (which nlohmann definitely isn't).

EasyMark · 2025-09-21T19:26:33 1758482793

Yeah I use this and I think most of friends do too :)

mbac32768 · 2025-09-23T15:18:49 1758640729

yeah it seems like every other C++ project uses it

fHr · 2025-09-21T20:36:59 1758487019

holy shit

forty · 2025-09-21T18:35:07 1758479707

Parsing JSON is a Minefield (2016)

https://seriot.ch/projects/parsing_json.html

codr7 · 2025-09-21T22:08:32 1758492512

Not if I'm also the producer.

president_zippy · 2025-09-22T02:21:14 1758507674

Finally, I have found someone who understands the purpose of using someone else's tiny header-only C library; someone who sincerely thought about it before looking for an excuse to bitch and complain.

flohofwoe · 2025-09-21T17:51:24 1758477084

You can't get much more 'opinion-less' than this library though. Iterate over keys and array items, identify the value type and return string-slices.

IshKebab · 2025-09-21T17:59:53 1758477593

It also feels like only half the job to me. Reminds me of SAX "parsers" that were barely more than lexers.

flohofwoe · 2025-09-21T18:05:24 1758477924

I mean, what else is there to do when iterating over a JSON file? Delegating number parsing and UNICODE handling to the user can be considered a feature (since I can decide on my own how expensive/robust I want this to be).

skydhash · 2025-09-21T18:12:26 1758478346

That is what I like Common Lisp libraries. They are mostly about the algorithms, leaving data structures up to the user. So you make sure you got those rights before calling the function.

IshKebab · 2025-09-21T20:03:10 1758484990

Extracting the data into objects. Libraries like Serde and Pydantic do this for you. Hell the original eval() JSON loading method did that too.

meindnoch · 2025-09-21T22:58:21 1758495501

Then you lose the ability to do streaming.

IshKebab · 2025-09-22T06:49:14 1758523754

True, but usually you only need that if your data is so large it can't fit in memory and in that case you shouldn't be using JSON anyway. (I was in this situation once where our JSON files grew to gigabytes and we switched to SQLite which worked extremely well.)

meindnoch · 2025-09-22T08:06:14 1758528374

Actually, you'll hit the limits of DOM-style JSON parsers as soon as your data is larger than about half the available memory, since you'd most likely want to build your own model objects from the JSON, so at some point both of them must be present in memory (unless you're able to incrementally destroy those parts of the DOM that you're done with).

Anyhow, IMO a proper JSON library should offer both, in a layered approach. That is, a lower level SAX-style parser, on top of which a DOM-style API is provided as a convenience.

IshKebab · 2025-09-22T12:31:20 1758544280

> since you'd most likely want to build your own model objects from the JSON, so at some point both of them must be present in memory

Not really because the JSON library itself can stream the input. For example if you use `serde_json::from_reader()` it won't load the whole file into memory before parsing it into your objects:

https://docs.rs/serde_json/latest/serde_json/fn.from_reader....

But that's kind of academic; half of all memory and all memory are in the same league.

meindnoch · 2025-09-22T14:29:04 1758551344

That's only true if your model objects are serde structs, which is not desirable for a variety of reasons, most importantly because you don't want to tie your models to a particular on-disk format.

IshKebab · 2025-09-22T21:45:20 1758577520

In the vast majority of cases you can and should just load directly into Serde structs and use those directly. That's kind of the point.

In some minority of cases you might not want to do that (e.g. because you need to support multiple versions of a format), but that is rare and can also be handled in various ways directly in Serde.

TheRealPomax · 2025-09-21T20:46:53 1758487613

Anyone who claims "it's not a very difficult problem" hasn't actually had to solve that problem.

codr7 · 2025-09-21T22:10:28 1758492628

Except I have, several times, with gopd results.

So in this case you're wrong.

General purpose is a different can of worms compared to solving a specific case.

patrickmay · 2025-09-21T21:11:22 1758489082

> JSON parser libraries in general is a black hole of suffering imo.

Sexprs sitting over here, hoping for some love.

codr7 · 2025-09-22T21:17:09 1758575829

I still mourn the timeline where we got a real Lisp in the browser instead of the current abomination.

nicce · 2025-09-21T18:13:10 1758478390

The project advertises that it has zero-allocations with minimal state. I don’t think it is fair or our problems are very different. Single string, (the most used type), and you need an allocation.

EE84M3i · 2025-09-21T17:05:24 1758474324

This is interesting, but how does this do on the conformance tests?

https://github.com/nst/JSONTestSuite

LegionMammal978 · 2025-09-21T18:19:57 1758478797

It doesn't seem to have much in the way of validation, e.g., it will indiscriminately let you use either ']' or '}' to terminate an object or array. Also, it's more lenient than RFC or json.org JSON in allowing '\v' for whitespace. I'd treat it more as a "data extractor for known-correct JSON". But even then, rolling your own string or number parser could get annoying, unless the producer agrees on a subset of JSON syntax.

catlifeonmars · 2025-09-21T18:16:09 1758478569

You know what would really be useful is a conformance test based on a particular real implementation.

What I mean by this is a subset (superset?) that exactly matches the parsing behavior of a specific target parsing library. Why is this useful? To avoid the class of vulnerabilities that rely on the same JSON being handled differently by two different parsers (you can exploit this to get around an authorization layer, for example).

Lucas_Marchetti · 2025-09-21T17:12:26 1758474746

Real question, does it manage nested objects ?