Hacker News new | past | comments | ask | show | jobs | submit login
The Rise and Rise of JSON (2017) (twobithistory.org)
89 points by mariuz on July 22, 2020 | hide | past | favorite | 167 comments



I wish JSON had 1) trailing commas 2) comments (I'm not sure if this is a good idea or not but every once in a while I impulsively write comments)


> 2) comments

Per Douglas Crockford, the creator of JSON, comments were an anti-feature:

> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.

> Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

* https://web.archive.org/web/20120507093915/https://plus.goog...

Discussion on the post (2012):

* https://news.ycombinator.com/item?id=3912149

Remember: JSON was designed primarily as a data exchange format between computers.


Processing instructions can always be added as a special name value entry, that wouldn't be a good reason to get rid of name value pairs


This is what I figured would be the reason to remove comments, because they would hold some other non-human information. It's unfortunate that people do this. The main reason for comments for me would be for config files, which maybe shouldn't be in JSON anyways.


> The main reason for comments for me would be for config files, which maybe shouldn't be in JSON anyways.

I think this is a bit of 'over reach' in what JSON was intended to do, so it's perhaps not surprising that there may be some things 'lacking' for that purpose.


Comments are a great idea... for a configuration format!

They're really bad for a data interchange format though, because people inevitably start putting important data in them and you end up with two different ways to write strings, one of which isn't supported by every parsing library.

Thus each discussion of JSON ends up being two groups talking past each other, the people using it as a configuration format who lament the lack of comments, and the people using it as a data interchange format who celebrate it.

The solution: since it's too late to add comments now, don't use JSON as a configuration format.


Re: "[Comments are a] bad for a data interchange format though, because people inevitably start putting important data in them"

Any feature or tool can be abused and misused. Comments are useful for configuration files, period.


Adding comments to configuration files means you can’t manipulate these files automatically. The advantage of manual configuration and comments has to be pretty great if it has to balance against losing GUI configuration and the other features that depend on automatic manipulation of configuration.


I automatically manipulate configuration files that contain comments all the time. If that's not doable then the tooling is the problem.


It’s great that you made that work for the limited manipulation you want to do on files with comments in a limited structure but in the general case it’s not possible.


Of course it is, if your tooling is good enough. My use cases are far from limited.


JSONC is a thing and has both of those features. VS Code used it all over the place.


OP did said that those features are good for a config format, and that's precisely how vscode uses it.

For data interchange formats comments do presents major problem.


So Yaml?


YAML's way, way more complicated that JSON. To pick one rough measure, the spec is about 7x as long. It includes a bunch of things that could be seen as negatives, such as nine(!) different ways to write multi-line strings (https://stackoverflow.com/questions/3790454/how-do-i-break-a...).

I don't have a single competitor to recommend, but JSON5, TOML, JSONC (mentioned in a sister comment), or something else along those lines might be better. I'd probably just go with whichever of those is popular in your community.


i really like hashicorp hcl. more than json.

(and yaml is TERRIBLE)


YAML might be terrible for the reasons given elsewhere but I really want something YAML-like (maybe TOML, maybe something else)

For me, readability is king both for myself and because I sometimes want non-developers to be able to hand-edit files.

Any curly brace format is a non-starter. Too much clutter, too easy to create invalid files that aren't obviously invalid at a glance.

Significant white space does have the advantage of meaning exactly what it appears to mean and being intuitively understandable to most people.


The other day I saw an auto-generated json file with comments and trailing commas and got really excited. A little digging showed that it was using JSON5. Pretty sweet!

https://json5.org/


Cool! JSON5 looks intriguing.

The core project implements a JavaScript module [1] but it looks like there are also experimental JSON5 libraries for Go [2], Python [3], Ruby [4], and Rust [5].

[1]: https://www.npmjs.com/package/json5

[2]: https://github.com/yosuke-furukawa/json5

[3]: https://pypi.org/project/json5/

[4]: https://github.com/bartoszkopinski/json5

[5]: https://docs.rs/json5/

I noticed that JSON5 doesn’t include a Date type. There have been discussions about this in both the JavaScript [6] and spec [7] repos but with no resolution yet.

[6]: https://github.com/json5/json5/issues/3

[7]: https://github.com/json5/json5-spec/issues/4


What would really help JSON5 take off is if it gets bundled with browsers by default. For now it doesn’t seem all that appealing to add

  <script src="//unpkg.com/json5/dist/index.min.js"></script>
to every HTML page as extra JavaScript instead of a native browser implementation.


3) set types, 4) optional schemas 5) multiline strings 6) consistent support for numeric types 7) streaming data

The litany is not short and I'm sure I've overlooked a couple.


JSON is popular because it’s tiny. If you want to trade tiny for a litany of features and data types then just use XML.


You should try XML. You may like it.


That reminds me of my favourite XML quote: "XML is a lot like violence. If it isn't solving your problem, then you aren't using enough of it"


JAXB is at the same time awesome, and completely terrifying.


You can try json schema. Streaming data is also easy with json lines.

But if you add too many features, it may end up like XML.


yeah, you can do everything by imposing more topology on top of JSON just like as it happened with XML. The JSON advocates are slowly reinventing everything that their parents built into XML 20 years earlier. My hunch is that it will share its path in acquiring bloat.


From what I have seen, the excessive topological frameworks on top of JSON have never really take off. There have been many schema systems available for 10 years now but for some reason everyone seems to fall back to accepting basic JSON. There is a place for this stuff but only in very limited areas. Hopefully it does not go too far, for example like Microsoft did with SOAP. I still have nightmares from work decoding completely wrong XML generated by the native SOAP XML generators in old C# banking apps.


What do you mean by schema? JSON supports schema with libraries included in most popular languages nowadays. https://json-schema.org/

For numeric types, if you handle integer larger than 2 billions or floating numbers where the precision is critical (that's where json fails), these can and should be represented as string. I don't think there is any common format that guarantees large numbers and/or arbitrary precision to be encoded and decoded properly across languages and platforms.


8) date types


So basically EDN.


Right


> 4) optional schemas

Perhaps CDDL:

* https://tools.ietf.org/html/rfc8610


I just wish it supported all floats (nans and infinities too).


An integer datatype would also be very useful. 53 bits isn't always enough.


The format doesn't have any set limit. Rather, it's implementations that set a limit. Ruby's standard JSON parser returns values of class Integer for integers in the JSON.


Yes, definitely. Every time I work with PHP Arrays, I'm reminded both these things don't exist in JSON and it makes me sad.


There you go json for humans https://hjson.github.io/

This allows trailing commas and comments. This can also reads strict json of course. Libraries are available for all popular languages.


HOCON solves pretty much all the problems JSON has, including the ones you mentioned.


It's funny I have the opposite feeling. In my case I write a formatter/validator in each project for my json that fixes issues like trailing commas and is also where I annotate the data structure, but I can imagine for those who write JSON by hand those two limitations would be a headache.


JSON won, IMO, because it's human readable and writable. It came at a time when its main competitor in this space - XML - had gotten too complex.

However, JSON is hitting the same limitations and problems that XML faced, and is following in their shoes (namespaces, schemas, x/jpath, implementation drift across libraries, etc).


Ehhhh, is it hitting those limitations? I've yet to run into a problem in which would be solved by adding Enterprise™ silliness.

And is it following in XML's shoes? I've not had to work on or integrate with any systems that did this Enterprise™ silliness either.

Now granted, I'm fairly separated from the world of IBM and SAP type companies, and I'm sure there's all sorts of unholy abuse of JSON going on in a dark server room somewhere, but the answer in most cases is just to choose technology stacks that don't do that to you. These problems are generally self-imposed rather than imposed by the technology.


> Ehhhh, is it hitting those limitations? I've yet to run into a problem in which would be solved by adding Enterprise™ silliness.

I think that's why JSON won: it was first and foremost and engineer-driven standard as opposed to one that incubated inside a bunch of enterprises before being dropped on an unsuspecting world.


SGML, XML's predecessor, was definitely engineer built (the wiki page is interesting). It just paved the same path that JSON is now traversing. Given that XML is roughly 34 years old (or GML, which was SGML's predecessor and is about 60 years old), give JSON just a bit more time to get to the same level of 'blech'.


> Ehhhh, is it hitting those limitations?

Yes. Some folks, even at my relatively small company, are using JSON and also pushing for extensive schema use.

JSON - just like XML - allows you to use its vanilla form. But the additional tools are available if/when you need them, because the need exists.


Just because people are pushing for them doesn't mean they are needed. I have no idea what your needs are, so maybe they are needed at your company. All I can say is, I've never come across a need for them but I have met a lot of people who just want to try things out because they think it might be cool.


Yeah, but they were reinvented in JSON, with the key difference that you didn't have to pay for them if you didn't need them. JSON unbundled XML.


The same was true for XML.


It really wasn't. Implicit remote resource dependencies, validation, namespaces, and parser-settings dependence of all of these were huge footguns. You could turn these features off if you controlled the format and parser, but you might not, and in any case you would have to understand the features to ensure you wouldn't trip over them, which meant you couldn't ignore them at zero cost. They were syntactically unbundled but not conceptually unbundled. XML had zero sympathy for the casual user and the results were 100% predictable, earned, and deserved.


But the same is true for JSON. Of course you're going to be constrained by a legacy system of you're constrained by a legacy system. Neither JSON nor XML force the implementor to use any of the crazy stuff.


In theory that's true, but in practice out-of-the-box defaults and culture matter enormously and differ greatly between JSON and XML.

With JSON, you have to go looking for this sort of trouble if you want to find it. An out-of-the-box JSON parser is all but guaranteed not to coerce a schema, fetch remote resources, apply namespaces, and so on -- if you want to do those things, you have to go out of your way to do them, and your decision to do so is likely motivated by considering tradeoffs rather than cargo-culting because the decisions are A) explicit and B) made in a cultural context where schema-free and remote-ref-free JSON is common practice rather than a sneered-upon possibility.

With XML, you don't have to go looking for trouble, the trouble finds you. XML's ecosystem absolutely did push everyone in the direction of using schemas and remote-refs. DTDs were both common and encouraged with a heavy hand. All the Big Protocols used them, and instead of emphasizing the differences between Big Protocol use cases and duct-tape use cases, the culture tended to get preachy about DTDs being "the right way" to do things. If you chose differently, you were amateur scum not worthy of the Enterprise. This encouraged cargo-cult application of DTDs in places where they weren't appropriate (e.g. maximally economic / flexible glue) and for purposes that weren't appropriate (e.g. as a substitute for documentation). It also lent "moral license" to the wrong party in validation-related toe-stepping. Party A turns on validation, Party B sees their integration stop working, Party A argues that if Party B hadn't been writing morally degenerate XML they wouldn't have been caught out and can point at an abundance of preachy XML gospel to back them up. Party A gets away with arguing that an API change isn't an API change, or that a new internet/VPN dependency isn't new. Party B gets trod upon and remembers that XML was complicit in it. Sure, the XML ecosystem could have avoided these problems without a single code-change or standards-change by creating a culture of acceptance around schema-free XML, but they didn't.

When JSON promised both cultural acceptance and appropriate defaults for the common low-rent use case, people flocked to it, and rightly so. The rest is history.


JSON won because the web and javascript, which supports JSON, won. It would have won regardless of how ugly or awkward it was for that reason alone.


Except that the x in ajax stands for xml


JSON is already based on javascript's object model so it makes more sense to pass it around instead of a format not natively supported by the language that will be processing it.


It's a real shame that both XML and JSON eclipsed the real data format for the ages: S-expressions. I hope it's only temporary, though: there really is no good reason the prefer JSON other than everyone else using it, and if one gets into a novel enough domain then there isn't an everyone else doing anything to worry about yet.

S-expressions have the advantage of not containing the redundant object type (there's no need for an object, map or dictionary when one has alists or plists), and the even greater advantage of elegantly representing code.

Also they are easier to write parsers for.


Please provide more details, insights, anecdotes, and examples of why you think S-expressions are superior.


There was this: http://people.csail.mit.edu/rivest/Sexp.txt but it was never even accepted as an RFC, it seems.


Very interesting. Did anyone else look at this, and think it was super complicated, at first glance? You’d have to be an expert in the data exchange format itself.

I can see why Lisp style syntax is bad for human comprehension. It’s because of the lack of an explicit delimiter, like a comma. I don’t think human eyes are very good at using a single space character, as a delimiter.

And with this format, you’d have to agree upon the data exchange structure. There would have to be a master key somewhere, maybe as the header. Albeit, this format is far more efficient than XML.

But, XML probably won out because it was explicit, and allowed multiple levels of nesting, but at the expense of overly wordy delimiter tags.

And JSON made it simpler, by forcing it to be a lightweight key-value pair.

It’s too bad something like this didn’t take root and become more popular. This might be a very useful advanced data exchange format for some scenarios. Although the headaches and problems associated with trying to understand, and work with it, might far outweigh its efficiency benefits.


My only problem with json files is that you can't add comments to them in a persistent way (especially if changes in the data are written by a program).

Unless of course you'd put the comments in the data itself, but that's kind of ugly.


You might like JSON5 (https://json5.org), which allows comments and has direct support in various tooling.


There are many json alts, but at the end of the day unless it's supported everywhere, it kinda defeats the point, because what makes JSON great is that it's supported almost everywhere.


90% of when I use JSON is when I control both sides of the interaction. E.g. a web app where I have no expectation of third party clients, or a config file that will only reasonably be read by one program.

JSON is normally "good enough" as a general wire interchange format, and it's human-readable and whoever comes after me will already be familiar with it. But if there's a time when it's not good enough and its failures are in expressiveness, I'd totally consider JSON5 or some other alternative instead.


Even for end-to-end internal projects, as soon as you stop using a built-in format and need to use thirdPartyEncode and thirdPartyDecode on both ends, you might as well just choose the absolute best third party library rather than choose something that slightly enhances the built-in.


Once upon a time (a mere decade ago) JSON support was nowhere.


Yes, thanks, but are the comments stripped out when you read the JSON? Or are they still accessible, so they can be written back when a program modifies the JSON slightly?


That's a software problem moreso than a parsing problem.

Plenty of software that uses regular JSON also strips unused data when writing back to a file it read because internally it transforms the document into a data structure with no space dedicated for unknown data.

It would be up to the software, or a very strict standard or library, to preserve unknown/unused data.


It's true, but I've used patterns like this to good effect:

    {
        "_COMMENT": "default; override with --s3-bucket",
        "s3_bucket": "foo-bar"
    }


No worse than putting data in comments. The CDDB format was notorious for putting track length and some other details in comments, and a fully supported parser then had to be smart enough to extract data from them


> My only problem with json files is that you can't add comments to them in a persistent way

JSON was first designed as a data exchange format between computers (that was lighter weight than XML (which was lighter weight that SGML)), so that fact that it's being used in persistent fashions is getting away from its primary purpose.


I would argue that any human-readable format that relied on defining blocks by opening/closing symbols is a format that's not suited for config files that are expected to be manually edited by humans with a text editor.


my personal pain is that you can't add trailing comma (PAIN!!!)


Personal opinion: putting comments in the data is a smaller sin than putting data in comments. More information @ https://news.ycombinator.com/item?id=3912149


Ok, but would you prefer:

    [1, 2, 3, 3.14159, "comment: Pi!!", 4, 5, 6, 7, 8]
over:

    [1, 2, 3, 3.14159 /* Pi!! */, 4, 5, 6, 7, 8]


I'm more interested in the implications of picking these examples than the examples themselves. Hope that makes sense.


How about:

    [1, 2, 3.5 /* faster, use 3 for more stability */, 4]


I don't want all the baggage that comes with accepting that option including people who then put parsing directives inside comments. You seem like an intelligent person. Maybe you could come up with a JSON-compatible way to include comments.


Also: streaming


There’s jsonlines but it is not widely supported. Eg PowerBI has to be told to treat it as CSV then convert each row in to JSON rather than a native jsonl type.


Can't you implement streaming by just putting everything in a giant JSON list?

Or do you want to send different parts of e.g. a map in separate streams?


How do you parse a JSON list before you get to the end?


First you check for a beginning [

Then you parse elements one by one. While your particular library may not support it, it's not a very hard thing to implement

    ["a", 1, "b
A parser could at this point give us the "a" and the 1, but not the "b" since the string could have more content.

You probably want the parser to give you an object that behaves as a collection you can for-each loop over, and that blocks when there is no more data available.


Presumably similar to how you would stream XML. I guess my question would be, isn't this already possible? You just need a reader that knows that the JSON source is a stream and blocks reads until data is available. Then use a concurrency mechanism to coordinate between the stream reader and the consumer of the stream data to keep things running smoothly (or if you don't mind blocking just use the blocking reader on the same thread/process as the consumer).


How does this work?

If I got multiple items, they're in an array. If such a reader blocks reads to an element until it's finished, I've won nothing if I have to wait until that array finished.


As I said, you'd need some measure of concurrency. Or perhaps a non-blocking reader. Try to get new stuff, if it's not there continue on, I just don't like getting "null" responses like that as it's too close to a busy-loop which is, well, poor form. I generally write code that interacts with the world in its own thread/process/whatever so it doesn't block the rest of the system unless appropriate.

The utility here is if you can process each element one at a time. If you need everything to get a meaningful result it offers no benefit to you.


You can use a JSON parser for streams. Instead of returning a full document once everything is parsed, it can process elements one by one.


A streaming JSON parser is a parser that emits {"[","]","{","}",string,number,bool,null} without holding anything in memory.

Where do you think it's different from streaming XML (SAX)?


thats essentially what we do in embedded systems.


It'd be nice to know what the graph would like like now, three plus years later. I imagine it kept along the same lines. JSON is everywhere.

I suspect that a lot of the popularity is piggy-backing on the popularity of JavaScript.

But what the heck is .CSV doing on an incline?!


> what the heck is .CSV doing on an incline?!

It is still the most common interchange format for tabular data. Tabular data is too verbose/unreadable when represented in standard JSON. Databases and Data Science are also on the rise; .csv continues to ride this wave.


You can use column-oriented JSON (with an array for each column of data instead of each row), which is roughly the same size as a CSV file.

The downside is that you can’t process a row at a time.


Also, "Why use something else when we can use data.join(',') -- that should be enough!" must also be a contributing factor to why CSV will never die.


Except that doesn't work. CSV requires quite complicated quoting rules. Plus the records are separated by CRLF, not a "newline" as many seem to think.


That was the point of my failed joke.

In other words, people thinking they can just join(",") keeps them from reaching for an actual serializer (JSON, XML, etc), and if they realized they already have to bring in a CSV library anyways, they might consider using another format. Comedy gold, huh? Though I'm only half-joking, I've done it before and we've all consumed "CSV" from people who had the some presumption. :)


CSV generation and parsing is inconsistent based on tooling (see inevitable space/comma/quote problems, for example), while JSON has a strict enough spec that you can unambiguously use it in the same way for CSV-like data as long as nobody needs to literally edit it directly in Excel.


CSV has a specification too - RFC 4180, "Common Format and MIME Type for Comma-Separated Values (CSV) Files" [1]

[1]: https://tools.ietf.org/html/rfc4180#section-2


CSV forces struct-of-arrays and forces each row to have the same structure. Those are sometimes useful constraints. Also, you can obtain it from (legacy app) and open it in excel.


CSV is so unnecessary - ASCII has built in delimiters for fields outside of the printable characters.


Text tools have intermittent support for those which would destroy the one big advantage of CSV.

Once you're using a binary format, you might as well use something like parquet or hdf5 (or ASN.1 I suppose) that give you other goodies in addition to airtight data/organization separation.


CSV is for human-readable data. If you don’t need to be human-readable, you might as well go with ASN.1 or something.


Wait, seriously? Interesting. Can you link me to something that demonstrates that was intended rather than a quirk of the simple format?

I always thought it was chosen since it was a seemingly sensible delimiter at first glance and then you only realize it's less useful well after you see others using your spreadsheet app for storing complex strings.


> Can you link me to something that demonstrates that was intended rather than a quirk of the simple format?

It’s hard to find references of the intentions of the creators for a format this old, which evolved over decades of different software having similar input formats, starting in 1972. Most of these early descriptions accept both spaces and commas as separators, hinting at a manual input process.

The closest I can find is a 1983 reference¹ which indicated that the format “…contains no other control characters except the end-of-file character…”, which I take to mean that the format is easy to handle and process with other text-handling tools, but still, no definite reference of direct intent.

https://archive.org/stream/bitsavers_osborneexeutiveRef1983_...


Gotcha. Thanks for taking the time to look!

It does seem like it was made to be easy to visually parse but it still feels like the delimiter choice has been something of historical baggage (to the point where just changing the delimiter itself, such as TSV, makes it easier to parse).


I wondered about this and apparently it dates to 1972: https://blog.sqlizer.io/posts/csv-history/ - in which case it may have been used for EBCDIC conversion! It looks like it pre-dates the spreadsheet (early 80s, Mitch Kapor/Lotus) by a long way.


I might be missing something, but isn't that somewhat obvious?

Sometime between 8-12 I 'invented' csv when I need to save data for some simple game I built. It seems like the most obvious solution to come up with: store data in a text file, separate the fields with a comma (or other character) and the 'entries' with a newline.


Note: The spreadsheet program comes from 1979 and VisiCalc, which was so enormously popular that many people bought Apple II computers just to run VisiCalc. Lotus 1-2-3 was written a few years later as a VisiCalc clone which took advantage of the larger screen and memory of the IBM PC.


> no new version of the JSON specification is ever expected to be written.

T̵h̵e̵ ̵l̵a̵t̵e̵s̵t̵ ̵v̵e̵r̵s̵i̵o̵n̵ ̵o̵f̵ ̵J̵S̵O̵N̵ ̵w̵a̵s̵ ̵a̵p̵p̵a̵r̵e̵n̵t̵l̵y̵ ̵r̵e̵l̵e̵a̵s̵e̵d̵ ̵l̵e̵s̵s̵ ̵t̵h̵a̵n̵ ̵a̵ ̵y̵e̵a̵r̵ ̵a̵g̵o̵.̵ [EDIT: I was mistaken: the latest standard was RFC 8259. However, this was still published on 2017-12-13, after the article was written.] There have been three different RFCs alone, all defining JSON. (And let’s not even get into the thing called JSON5.)


There were six changes outlined in RFC 8259, three of which were moving from ECMA-262 to make ECMA-404 a normative reference, one was to explicitly state UTF-8 should be used in a particular section, and one was 'to increase the precision of the description of the security risk that follows from using the ECMAScript "eval()" function':

* https://tools.ietf.org/html/rfc8259#appendix-A

Not very Earth shattering revisions IMHO.


Which specification got an update last year? Google's not turning anything up.


I've tried to find out what was the change less than a year ago, and I couldn't find this information. So, what was the change?


I seem to have misread my reference: the change a year ago was to Javascript, to make it conform to JSON, not the other way around, which was how I originally interpreted it. The latest version of JSON itself seems instead to be the latest RFC from 2017-12-13. Note: this was still after the article was written.


One reason in favor of using JSON over XML for web services I don't see mentioned often is that many XML schema based deserializers will by default fail if they get an unexpected element whereas JSON deserializers will ignore it, making it easier to remain backwards-compatible. Example: If you have the following defined in your .xsd

  <complexType name="MyMethodResponse">
   <sequence>
    <element name="A" type="string" />
    <element name="B" type="string" />
   </sequence>
  </complexType>
and you later add a "C", existing clients will fail, so you will have to create a new method along side the old one to remain compatible.


Following proper XML spec always led to more problems than what they were meant to solve. Sometimes I use XML, but just the syntax with a more html5 style to it.

Similar to defaulting to error-ing on extra tags, I never got the point of making everything an element and adding extra redundant attributes. That's really just the result of XML that's automatically generated.

Like the `<element... type="string">` in your example of classic XML, why not just use `<a>3</a>` or `<a int="3"/>`. XML proper styles still suggest the classic long/autogenerated form. Writing XML in html5 style like `<my-a data-value="3" />` makes it much friendlier. Or as I tend to use it for various internal protocols to wrap multiple CSV data in a file:

    <some-data type"csv" columns="A, B, C">
      1.3, 3.4, 5.6
      1.3, 3.4, 5.6
    </some-data>
Technically That's really just html5 in a file I guess. :-) But html parsers also don't tend to complain about extra tags too.


I've always wondered why the people who designed these things thought that I wanted a exception to be thrown in that case. It seems to be a "enterprise" feature, Java SQL libraries often blow up if they find a column that isn't predefined.


JSON was clearly built with XGH axioms in mind: ‘errors only appear when you notice them’.


I remember working with SOAP and WSDL. One advantage I kind of miss from that era was the static generators that would create your service classes and the strongly typed DTOs based on the WSDL. I understand some effort has been made to replicate this (e.g. json-schema) but it isn't nearly as widely used as WSDL seemed to be. To be honest, I only kind of miss it since XML was such a major pain to deal with (any one ever had to deal with XPath or XSLT? What a nightmare those were).

I recently used GraphQL on a project and it had some nice advantages. I love the idea of protocol buffers but have never had the chance to use them in anger. But if I'm honest, the boring option is JSON and it is what I would use for just about any API I had to expose nowadays.



I remember 2010, when I build my first API. People laughed at me for using JSON.

I had to add XML and CSV, because "nobody would integrate with a JSON API"


I think whatever company you were working at was a few years behind the curve, because JSON was already pretty much the standard for SPAs by 2010. For example, that's the year AngularJS came out, with JSON/JSONP as the only built-in serialization format for communicating with backend APIs.


Pretty much, yes.

In 2011 we moved from an PHP app to an SPA where JSON came in handy.


For small, heterogenous structs like messages and config it makes sense -- for repeated/homogenous collections (array of similar struct - a common case) both JSON and XML are wasteful -- a reflective/inline schema combined with a table would be more readable and space efficient. Also, meta-data needs to be either embedded or provided by special keys, so general meta-data tagging support would be great, where a non-meta-data-aware reader would only get the target data, but any JSON datum could be tagged with another JSON datum as meta-data. (The meta-data tags could be implement the inline-schema for the table as well.)


JSON is one of the best things to ever happen to software development.


It really isn't. I'm almost tempted to say it's the opposite though that would be overstating the case.

It's a format that still bears excessive decoration (what's the purpose of quotes around field names? what are all those commas for?) yet it's limited in the types of data structures that it's able to express (natively). I'm not particularly fond of Clojure specifically but a format like EDN would have been superior in just about every way.


I'd hardly call a few bytes per key and field excessive. Especially compared to something like XML.

The datastructure complexity being limited is also a pretty significant key to its success. More complex datatypes means greater chances for JSON handling libraries to lack compatibility.

The only substantial shortcomings of JSON I see are shortcomings associated with any textual serialization format. Optimizing for human readability in a use case that's 99.99% of the time not read by a human.


JSON looks good when compared to XML.

That's literally how low you have to go.

"The only substantial shortcomings of JSON I see are shortcomings associated with any textual serialization format. Optimizing for human readability in a use case that's not 99.99% of the time not read by a human."

There are a few other shortcomings that are reasonably substantial/significant, but yeah, that's the gist of the problem.


> JSON looks good when compared to XML.

JSON looks good compared to what we'd be using instead of JSON, which is nothing so nice and structured as XML. The competition to JSON is something infinitely more ad-hoc, probably without a distinct parser, such that a generic library to generate or consume it is impossible, and getting usable error messages is equally impossible.

> "The only substantial shortcomings of JSON I see are shortcomings associated with any textual serialization format. Optimizing for human readability in a use case that's not 99.99% of the time not read by a human."

I agree with this and disagree at the same time: Optimizing for human readability means optimizing for the weird case, the 0.01% (but it seems to be more often than that) of the time you need to go beyond the tools you have to fix something. Saying that's rare is true but inapt: Seatbelts are only used in rare cases, too.


If we didn't have JSON we'd have settled on something like MsgPack.


Your 0.01% case is perhaps true with a binary format, but by using JSON, that 0.01% case just became a whole lot larger. ;-)

It's like, "hey, we came up with a simple way to do it, but to make it easier to deal with this little edge condition that creates complexity, let's significantly up the complexity and number of edge conditions so they're endemic to the space, and then we're all good go".

The irony is, I invariably end up needing to use a computer to help me read JSON anyway.


Even if we agree with your made-up numbers, 0.01% of the time that it's read by a human costs orders of magnitude more than the other 99.99%.


If anything 99.99% is probably underestimating it, especially for larger companies. If you send 10 million JSON documents per day, have 100 devs, and devs on average inspect 1 JSON document on the wire once per week, you're looking at closer to 99.9999%. Let's say that a binary format saves on average 10ms over JSON, then those 10m documents represent slightly over a day of overhead.

If you've got good built in tooling for payload visualization, then you might have minimal overhead to debug from a text like format. Both protobufs and flatbuffers (not to mention BSON), have good tools that spit out JSON equivalents.


Sure, if you make up numbers, you can argue that the sun is going to crash into the earth tomorrow and we're all going to die, so nothing in this conversation matters.

In reality however, there are some cases where protobufs, flatbuffers, or BSON are superior to JSON, but there are a lot of cases where they aren't. You'll have to weigh the pros and cons for each situation. And a lot of the time, there's not time to benchmark everything, so you kind of have to guess how the elements of the system are going to interact.

The one element that every system has is humans, so it's a fairly safe bet that humans will have to read whatever format you use.

I spend probably 15-45 minutes a day just in Postman, testing JSON calls. If something goes wrong, I'm inspecting requests/responses in Chrome. When we integrate a new team member, we don't have to have them install any tools--they're included in the browser they have installed. We don't have to write any schemas. When I start a new project, I don't have to install any libraries: they're included in my language(s). When we integrate with a partner company, we hand them sample requests/responses as text.

How many JSON documents per day do we have to send to get the payoffs you're claiming?

And my company is not unique: in fact, the stack I'm using is one of the most common stacks on the market.


There's a lot made about the format being human readable, but the actual bytes that fly over the wire that you are seeing through postman are not what you see on the screen. So you're relying on a tool to extract and render them. It turns out that regardless of how they are serialized, you can actually render them the same way.


In theory, sure.

In reality, now, JSON opens in everything from Chrome network inspector to Vim, and protobuffers/flatbuffers don't.


Wait, are you saying the tools you use for JSON don't work for non-JSON data? ;-)

Chrome also decompresses gzip and understands TCP/IP, but it doesn't handle LZMA or SS7.

Sure, thanks to our cult-like following of bad principles, we've made support of on a pretty broken stack with lots of terrible consequences ubiquitous. I'd argue that's bug, not a feature.


So let me get this straight: you think we should all start using tools that may or may not even exist, and if they do exist, would require us to install a bunch of new stuff, write a bunch of schemas, and retrain, all so that we can solve a problem which so far boils down to "if you don't I'm gonna call your tools broken and you cult-like"?

I make a solid income solving problems that people pay me to solve. Why should I abandon that and devote my life to achieving 9% size and 4% availability time increase[1] that no client has ever asked me for?

And to be clear, it's not that I don't care about performance. It's that if my automated tests notice an endpoint loading slowly or if a client complains about performance, I can almost always achieve order-of-magnitude performance gains by optimizing a SQL query or twiddling some cache variables, which almost never happens by switching serialization formats. I have used protobuffers in a few cases, where profiling indicated it as a solution, but this has not been the norm in my experience.

The first optimization is getting it to work, and the second optimization is whatever profiling tells you it is.

[1] https://auth0.com/blog/beating-json-performance-with-protobu...


Yeah, that's really not what I'm saying.

But I think you're right that as long as we stay this course there's going to be more problems and so you'll be able to make more money solving problems that didn't need to exist.

You're absolutely right about the first optimization being to get it work. You're just discounting the reality is that you're making it far more difficult for that to happen.


I think that's rather missing the point, but it's also not really true, because that ratio is substantially smaller if you talk about all the different bits of hardware that are having to decode the JSON. The relative cost there is extraordinary, let alone the complexity cost.

You can make tools that present data in any format in a way that is easy for humans to digest. Letting a very small and trivial part of the problem drive what is a much larger problem space is pretty flawed.


Has anyone calculated the carbon foot print of parsing JSON?


I read and work with JSON all the time, logs, responses, code generated from json data. The format suffers from not being readable because of the quotes issue. Especially when what you are putting in there has quotes, the amount of escaping required is ridiculous.

{"time":"2020-07-22T10:59:14.95406-04:00","message":"{\"level\":\"debug\",\"module\":\"system\",\"time\":\"2020-07-22T10:59:14.953909-04:00\",\"message\":\"Running MetricCollector.Flush()\"}"}

this is a very moderate example of what I deal with daily, all because JSON includes quotes around fields.


It gets worse when you want to put it into a c-string and you need to escape the quotes and the slashes again.


exactly, I am excited about new formats like Amazon's Ion though https://github.com/amzn/ion-js


They embedded a JSON string within the JSON itself.

I wonder if it would’ve been better for them to Base64 encode their message. Of course, this itself presents other problems.


Yeah but that 0.01% it's developers checking API output and whatnot, and being able to read the output without any tooling (or maybe just a JSON 'prettier' tool) is great.


Quotes allows for any characters and limited types is probably one of the main reasons it's widely used and implemented. There's no doubt vastly superior formats tailored to specific languages and purposes, but I think it's hard to argue that JSON was not a huge net positive to the software industry as a whole.


Listing some specific perceived flaws with JSON isn’t counter to the claim that it’s one of the greatest things to happen in software development.


Lack of integer type is really suck but that's inherited failure from JavaScript.


I still think it's a shame that Amazon's Ion format never got widespread adoption.

https://amzn.github.io/ion-docs/

Superset of JSON with many extra features that people in the comments here desire from their data-language, such as support for comments, timestamps and s-expressions.


Great article. It's strange for me to be old enough to have experienced JSON's entire history as a website maker - I didn't learn anything new in the article I didn't know before and never gave it much thought - but to read it all compiled in one place makes me want to appreciate that we are experiencing tomorrow's history right now.


A annoying problem with json is the lack of full floating point number support. Transferring NaN, +Inf/-Inf is a pain.


Crockford believes the One True Number should be decimal instead of floating point. He also believes that comments will be abused and turned into ad-hoc parser directives.

So everyone gets to suffer for his beliefs.


In 10 years we will see JSON with the same mix of mockery and regret as we see XML today.


I remember reading that opinion 10 years ago.


I remember using XML-RPC back in the day to communicate between a desktop client and a web service. It was fantastic. And then designer went off with a committee somewhere and produced SOAP. The regret with XML is that it continued to evolve into a monstrosity.

Thankfully JSON came along and got back to the simplicity of early XML and XML-RPC and stayed there.


I remember killing some old XML-RPC applications.

Deserialization and remote code execution vulnerabilities all over the place. That was brutal.

Who thought this was a good idea to pass arbitrary function names and arguments for the remote servers to resolve and execute blindly? The regular vulnerabilities in the XML parsing libraries themselves was the nice cherry on top.


> for the remote servers to resolve and execute blindly

I'm not sure who would would to that but I certainly didn't. Ultimately XML-RPC is no different from REST/JSON except it's in a different format. What you did with that format is a totally different issue.


Just take the first example from wikipedia:

    <?xml version="1.0"?>
    <methodCall>
      <methodName>examples.getStateName</methodName>
      <params>
        <param>
          <value><i4>40</i4></value>
        </param>
      </params>
    </methodCall>
The thing is meant to call arbitrary functions with arbitrary arguments. It doesn't take long until there is a straight up exec functions exposed or some accidental command injection.

It's strange to look at it 20 years later. The adoption of JSON really got developers to stop shipping RCE vulnerabilities every other week. Yet nobody must have thought of that when deciding what to use.


What do you think happens with REST and JSON? Or SOAP? This is exactly identical to:

    {
        "methodName": example.getStateName
        "params": [14]
    }
Although you'd probably instead have an REST endpoint contain the method name and the entire JSON body is the parameters. But the difference is minor. There's no reason this allows arbitrary execution than anything else.

Methods directly exposed to the web is how 99% of all MVC frameworks work.


> I remember using XML-RPC back in the day to communicate between a desktop client and a web service. It was fantastic.

My favorite quote about XML-RPC (from the plan9 people):

Some part of me desperately wants to believe that XML-RPC is some kind of elaborate joke, like a cross between Discordianism and IP Over Avian Carriers

I have exactly the same feeling regarding the JSON "protocols" and whatnot.


If you think about it, it is truly crazy. You're using HTTP over TCP/IP to send JSON to do RPC. Some form of RPC has effectively been around forever. But despite the layered complexity there is something to doing RPC this way that just works. Maybe it's that you can leverage several decades of work on load balancing and firewalling HTTP connections. Maybe it's that binary RPC protocols are too brittle and change is too difficult. I don't know.


Just curious-- have you actually read the specifications for both? The difference in length and complexity is like this comment vs. War and Peace.


If another standard were to dethrone JSON, my guess would be YAML.


Considering that the YAML spec rivals XML in complexity, I doubt it. Between that complexity, incompatible parsers, intentional loose types...what do you really gain?


YAML will never be a sane format for machines to talk to each other. There are just too many ways to interpret it. It's not even really a good human-readable format.

Maybe something like TOML? It's very human-readable and simple. But it's not a very good serialization language.


Although it is easier to see the benefits of using JSON as an interchange format because it is lightweight, I still believe that XML is more elegant and verbose than JSON. One of the complains for JSON has been the lack of schemas, although there are some ways around that in projects such as Apache Avro https://avro.apache.org/, registry schemas and all that Jazz


This is a great bit of internet history, if for nothing else than this Douglas Crockford quote in response to a flamebait-y argument about XML being superior to JSON:

“The good thing about reinventing the wheel is that you can get a round one.”


I'm quite a big fan of json.org it outlines the JSON specification in a handful of diagrams and a small amount of text, it's so simple and elegant.


This is such a painful contingent historical accident that we have json as the standard instead of hson


Amen

XML is a mess and a chore to work with (at least in any language that isn't Java I guess, but even then).

Yes, it has some rough edges. Yes it could be better.

But overall it's good. Not too complicated and not too hard. Works fine for most stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: