Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of people dislike that decision not to include comments in JSON, but I think while shocking it was and is totally correct.

In a programming language it's usually free to have comments because the comment is erased before the program runs; we usually render comments in grey text because they can't change the meaning of the program.

In a data language you have no such luxury. In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.



I don't dislike the decision at all, FWIW! For data interchange it's totally reasonable. But it does make JSON ill-suited for a bunch of applications that JSON has been forcefully and unfortunately applied to.


> In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.

But there's nothing stopping you from commenting your JSON now. There's no obligation to use every field. There can't be, because the transfer format is independent of the use to which the transferred data is put after transfer.

And an unused field is a comment.

    {
      "customerUUID": "3"
      "comment": "it has to be called a 'UUID' for historical reasons"
    }
If this would 'without doubt' evolve into a system of annotations, JSON would already have a system of annotations.


> so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds

IIRC Douglas Crockford explicitly stated that he saw people initially using comments for a purpose like ad hoc preprocessor directives.


> that decision not to include comments in JSON, but I think while shocking it was and is totally correct.

Yaml is fugly, but it emerged from JSON being unsupportive of comments. Now we’re stuck with two languages for configuration of infrastructure, a beautiful one without comments so unusable, the other where I can never format a list correctly on the first try, but comments are ok.


YAML also expanded to add arbitrary scripting via a pile of bolt-on capabilities so that it's now a serialisation language that's Turing-complete, or that includes Turing-complete capabilities within it, everything from:

  command:
    - /bin/sh    
    - -c
    - rm -rf $HOME
to:

  state: >
    {% set foo = states('...') %}
    {% set bar = states('...') %}
    {% if foo == FOO and bar == BAZ %} 
    ...
This makes it damn annoying to work with because everyone's way of doing it is different and since it's not a first-class element you have to rethink everything you want to do into strange patterns to work with how YAML does things.


This scripting is not a part of YAML. It could be done in JSON as well:

  {"command": [
    "/bin/sh",
    "-c",
    "rm -rf $HOME"
  ]}
In fact, this is completely equivalent to your YAML.


The difference is that in YAML it's kind of expected (the second pseudocode example is from Home Assistant where almost everything nontrivial requires embedding scripting inside your YAML) while I've never seen it done in JSON.


The use cases for YAML that don't involve any sort of scripting vastly outnumber the use cases for YAML that involve embedding scripts into a document; so it's a little unfair and inaccurate to say that "in YAML it's kind of expected".

It is more fair to say that if your document needs to contain scripting, YAML is a better choice than JSON; for the singular reason that YAML allows for unquoted multiline strings, which means you can easily copy/paste scripts in and out of a YAML document without needing to worry about escaping and unescaping quotes and newline characters when editing the document.


Jupyter notebooks are a form of scripting in JSON. Anyway, all this is the fault of specific tools, not of YAML. This is like saying that laundry pods are bad because people eat them.


JSON is obviously perfectly usable, given how widely it's used. Even Douglas Crockford suggested just using a JSON interpreter that strips out comments, if you need them.

And if you want something like JSON that allows comments, and you aren't working on the web, Lua tables are fine.


Many years ago I worked for a company that did EDI software. When XML was introduced they had to add support for that, just the primitive XML 0.1 that was around at the time with none of the modern complexities. With the same backend code, just switching the parsing, they found either a 100x slowdown in parsing and a 10x increase in memory use or the other way around (so 10x slower, 100x the memory). The functionality was identical, all they did was switch the frontend from EDI to XML.

Since EDI is meant for processing large numbers of transactions as quickly as possible, I hate to think what the move to XML did to that. I moved on years ago so I don't now whether they just threw more hardware at the problem to achieve the same thing that EDI already gave them but now with angle brackets, or whether the industry gave up on XML because of its poor performance.

Come to think of it I'm pretty sure they would have tried blockchain when that got trendy as well.


No, it was obviously and flagrantly incorrect, as evidenced by the success of interchange formats that do allow for comments, including many real world systems that pragmatically allow comments even when JSON says they shouldn't. This is Stockholm Syndrome.

But what can we expect from a spec that somehow deems comments bad but can't define what a number is?


How do you feel numbers are ill defined in json? The syntactical definition is clear and seems to yield a unique and obvious interpretation of json numbers as mathematical rational numbers.

A given programming language may not have a built in representation for rational numbers in general. That isn't the fault of json.


I can't really tell what you're trying to say; JSON also has no representation for rational numbers in general. The only numeric format it allows is the standard floating point "2.01e+25" format. Try representing 1/3 that way.

The usual complaint about numbers not being well-defined in JSON is that you have to provide all numbers as strings; 13682916732413492 is ill-advised JSON, but "13682916732413492" is fine. That isn't technically a problem in JSON; it's a problem in Javascript, but JSON parsers that handle literals the same way Javascript would turn out to be common.

Your "defense", on the other hand, actually is a lack in JSON itself. There is no way to represent rational numbers numerically.


I didn't say that json can represent all rational numbers. I said that all json numbers have an obvious interpretation as a rational number.

So far you haven't really shown an example of a json number which has an ambiguous or ill defined interpretation.

Maybe you mean that json numbers may not fit into 32 bit integers or double floats. That's certainly true but I don't see it as a deficiency in the standard. There is no limit on the size of strings in json, so why have a limit on numbers?


>> A given programming language may not have a built in representation for rational numbers in general.

Why did you say this?


As long as they stay comments there's no harm. As soon as they become struct tags and stripping comments affects the document's meaning you lose the plot.


Could you imagine hitting a rest api and like 25% of the bytes are comments? lol


Worse than that - people will start tagging "this value is a Date" via comments, and you'll need to parse ad-hoc tags in the comments to decode the data. People already do tagging in-band, but at least it's in-band and you don't have to write a custom parser.


See also: postscript. The document structure extensions being comments always bothered me. I mean surely, surely in a turing complete language there is somewhere to fit document structure information. Adobe: nah, we will jam it in the comments.

https://dn790008.ca.archive.org/0/items/ps-doc-struc-conv-3/...


Not sure it's a fair comparison. The spec says:

"Use of the document structuring conventions... allows PostScript language programs to communicate their document structure and printing requirements to document managers in a way that does not affect the PostScript language page description"

The idea being that those document managers did not themselves have to be PostScript interpreters in order to do useful things with PostScript documents given to them. Much simpler.

For example, a page imposition program, which extracts pages from a document and places them effectively on a much larger sheet, arranged in the way they need to be for printing 8- or 16- or 32-up on a commercial printing press, can operate strictly on the basis of the DSC comments.

To it, each page of PostScript is essentially an opaque blob that it does not need to interpret or understand in the least. It is just a chunk of text between %%BeginPage and %%EndPage comments.

This is tremendously useful. A smaller scale of two-up printing is explicitly mentioned as an example on p. 9 of the spec.


Reminds me how old versions of .net used to serialize dates as "\/Date(1198908717056)\/".


> Could you imagine hitting a rest api and like 25% of the bytes are comments? lol

That's pretty much what already happens. Getting a numeric value like "120" by serializing it through JSON takes three bytes. Getting the same value through a less flagrantly wasteful format would take one.

I guess that's more than 25%. In the abstract ASCII integers are about 50% waste. ASCII labels for the values you're transferring are 100% waste; those labels literally are comments.

If you're worried about wasting bandwidth on comments, JSON shouldn't be a format you ever consider, for any purpose.

lol


HTML and JS both have comments, I don't see the problem


And both are poor interchange formats. When things stay in their lane, there is no "problem." When you try to make an interchange format using a language with too many features, or comments that people abuse to add parsable information (e.g. "type information") then there is a BIG problem.


« HTML is a poor interchange format. » - quote of the century -


It caused all kinds of problems, though those tend to be more directly traceable to the "be liberal in what you accept" ethos than to the format per se.


> In a programming language it's usually free to have comments because the comment is erased before the program runs

That's inherent to the language specification, but it isn't inherent to the document. You have to have a system with rules that require that erasure.

Nothing prevents one from mandating a system that strips those comments out of JSON. You could even "compile" JSON to, I don't know, BSON or msgpack or something.

Just as nothing prevents one from creating tooling to, say, extract type annotations from comments in a dynamically typed language.


> while shocking it was and is totally correct

Agreed —— consider how comments have been abused in HTML, XML, and RSS.

Any solution or technology that can be abused will be abused if there are no constraints.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: