JSON is basically perfect if it allowed trailing commas and comments. TOML is no...

arp242 · on May 21, 2023

That's already fixed; in the upcoming TOML 1.1 you can write:

  tbl = {
     hello = "world",
  }

All the examples in the issue you linked should work.

https://github.com/toml-lang/toml/pull/904

bdudbdjsh · on May 21, 2023

published data standards having version is so wrong.

now if you see "config in yaml" you know nothing, zero, nada, about the format because all versions are so different and everyone implemented the version at the time and didn't bother to mention version. not to mention you can use a dozen syntaxes for yaml/toml and each application may not understand them all.

all this is so silly. we will stick with json and informal-ini forever one way or another.

arp242 · on May 21, 2023

It's not ideal, I agree, but it solves real problems for people, so there's that.

Many commonly-used standards today weren't created by a bunch of wise men in a room thinking how to bestow their wisdom upon the rest of us, they often originated in real-world applications, were refined over a period of time based pn real-world experience, and then became a standard.

TOML is the same. I hope it will become an RFC some day. We just need to fix a few outstanding issues first.

Most commonly used TOML parsers support 1.0; adding 1.1 support should be pretty easy as the changes aren't that large (I did it in the parser I maintain, and it's 10 lines of code or so that had to be changed, most of them quite trivial).

Marazan · on May 21, 2023

My opinion is that having a version is fine buy only if a version field is built into the spec.

Without that your config file is a boobytrap, with it everything is fine and your parsing libraries can even be backwards compatible.

eviks · on May 23, 2023

Since it's impossible to design anything great in tech on first attempt, what is the road to improvement without versions?

(and mentioning version could be a requirement in a future version)

ruuda · on May 21, 2023

And if you have to version it, Kelvin versioning is more appropriate for standards.

IshKebab · on May 21, 2023

How do you know which version you are using though?

arp242 · on May 21, 2023

You don't; you check the library or application's documentation.

TOML went through some substantial changes in the past with 0.4, 0.5, and 1.0. As I mentioned in my other comment[1], it's not ideal, but it is what it is.

I wouldn't be surprised if 1.1 would be the last version. Maybe there will be a 1.2 to clarify some things, but I wouldn't expect any further major changes.

[1]: https://news.ycombinator.com/item?id=36020654

IshKebab · on May 21, 2023

> You don't

Great. Very obvious.

josephg · on May 21, 2023

As much as it pains me to say so, this is probably fine for configuration languages so long as they’re backwards compatible. Eg toml is used by rust’s cargo tool. Cargo can just say “hey Cargo.toml is parsed in toml version 1.1 format”.

IshKebab · on May 22, 2023

How does your IDE and linter learn that?

In fairness it's probably not too bad as long as everyone actually migrates to the newest version eventually... But that isn't guaranteed - look at YAML. Or even JSONC. VSCode has a hard-coded lists of which `.json` files are actually JSONC. Gross.

afiori · on May 22, 2023

YAML is way more complex and JSONC is not JSON 1.1

SPBS · on May 21, 2023

The is great news, thank you for your work on this.

ohazi · on May 21, 2023

> JSON is basically perfect

Until you realize you can't actually store real integers because every number in js is a float...

Waterluvian · on May 21, 2023

JSON’s numbers are not IEEE-754. They’re numbers with an optionally infinite number of decimal places. It’s up to a parser to handle it. Python can parse these into integers if there isn’t a decimal place.

It’s in the name, but be careful not to get confused with JSON being JavaScript.

dijit · on May 21, 2023

You wrote this as if it’s a defense but honestly I feel even more terrified of JSON numbers now than I was before entering this thread, and before reading your comment.

Not following a set standard is undefined behaviour, leaving it up to the implementation is a large problem in other areas of computer science. Such as C compilers.

crdrost · on May 21, 2023

Yes but this is a necessary limitation for all human readable numbers. The context decides what to deserialize into and different contexts/languages will choose bigint vs i64 vs u64 vs i32 vs double vs quad vs float, whatever is convenient for them.

Heck, some of them will even choose different endian-ness and sometimes it will matter.

I still remember the first time I dealt with a Java developer who was trying to send us a 64-bit ID and trying to explain to him that JavaScript only has 52-bit integers and how his eyes widened in such earnest disbelief that anybody would ever accept something so ridiculous. (The top bits were not discardable, they redundantly differentiated between environments that the objects lived in... so all of our dev testing had been fine because the top bits were zero for the dev server in Europe but then you put us on this cluster in your Canadian datacenter and now the top bits are not all zero. Something like a shard of the database or so.) We have bigints now but JSON.parse() can't ever ever support 'em! "Please, it's an ID, why are you even sending it as a number anyway, just make it a string." But they had other customers who they didn't want to break. It was an early powerful argument for UUIDs, hah!

Waterluvian · on May 21, 2023

It also means you can use JSON for incredibly high precision cases by making your parser parse them into a Decimal format. You couldn’t do this if you specified these limitations into the language.

Edit: Omg that story. Eep. I guess if someone provided too-large numbers in a JSON format, you could use a custom parser to accept them as strings or bigints. Still, that must have not been a fun time.

crdrost · on May 21, 2023

Yeah I believe I hand patched Crockford’s json2 parser? It was something like that.

Waterluvian · on May 21, 2023

JSON isn’t intended to narrow all details. That’s up to the producer and consumer. If you use JSON you will specify these details in your API. JSON isn’t an API.

I wonder how many times this gets violated though, and how many times this “I dunno… you decide” approach causes problems.

magicalhippo · on May 21, 2023

If you want something stricter, specify it in the JSON Schema and use that[1].

You could declare your own "int32" type[2] for example, and use that. Then validate the input JSON against the schema before parsing it further.

[1]: https://datatracker.ietf.org/doc/html/draft-bhutton-json-sch...

[2]: https://json-schema.org/draft/2020-12/json-schema-core.html#...

monsieurbanana · on May 21, 2023

You could invent a language that represents data that is very explicit about having integers, the implementation in javascript would still spit out floating values, because that's all the language has.

So either you don't target javascript (which would be a bit silly in the case of JSON), or you go the other way and forbid integers, even in languages that do support them. Which is also kind of silly.

Ultimately the real issue is that javascript doesn't have integers and if you're interacting with it, you need to be aware of that, JSON or not.

nly · on May 23, 2023

Doesn't matter.

The baseline is anything written in C and C++, which don't have bignum or decimal types and so more or less always parse JSON numbers to either int64 or double, at best.

gliptic · on May 21, 2023

JSON allows you to store arbitrarily large integers/floats. It's only in JS this is a problem, not if you use JSON in languages that support larger (than 54-bit) integers.

the_gipsy · on May 21, 2023

That's the freedom of unspecified behavior.

marcosdumay · on May 21, 2023

As long as the same person is on both sides of a communication channel, he has total freedom on what to say and will understand it flawlessly!

That's what standards are for, isn't it?

no_wizard · on May 21, 2023

Annoyingly, it also doesn't support BigInt, which would alleviate this problem in JS as well

Simran-B · on May 21, 2023

A number in JSON can have an arbitrary number of digits, i.e. it can represent any BigInt value.

Supermancho · on May 21, 2023

> A number in JSON can have an arbitrary number of digits, i.e. it can represent any BigInt value.

In my experience, violating type constraints causes problems in downstream systems (usually with parsing or trying to operate on invalid values).

Number, as defined by the JSON Schema spec. A 32-bit signed integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647

BigInt is defined by various (MSFT, MySQL, etc): -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Most systems use a JSON String for large numbers, out of necessity, not JSON Number.

ghusbands · on May 22, 2023

In context, BigInt refers to arbitrary precision integers [1], rather than any particular size of integer, hence "arbitrary number of digits".

[1] https://v8.dev/features/bigint

Kwpolska · on May 21, 2023

If both sides are using a language with integer types, this is a non-issue. JSON does not prescribe the number types in use, so the implementations may just say that the field contains 64-bit integers, and just parse them to and from the usual int64 type of their language. It is also legal for JSON parsers to parse numeric literals into an arbitrary-precision decimal type instead of IEEE 574 floats.

josephcsible · on May 21, 2023

You can store arbitrary precision numbers in JSON. The spec explicitly doesn't lock you into floats or any other specific number format.

Spivak · on May 21, 2023

Only if you are both sides of the transmission. If you're sending JSON to code you didn't write you will eventually get bitten by software lossy re-encoding. Lots of places use strings for this reason.

It's like API's that mess up the semantics of PUT/GET so implementing idempotency is extra annoying.

somewhereoutth · on May 21, 2023

But JSON is strings. So 123.4 is essentially "123.4" but with the indication that it is supposed to be semantically a numerical value.

josephg · on May 21, 2023

Right. I want to see an indication of what sort of numerical value it is. Big integers interpreted as floats lose precision. And floats decoded as integers truncate anything after the decimal place. JSON makes it way too easy to get this stuff wrong when decoding.

somewhereoutth · on May 21, 2023

If it has a decimal point then it is a decimal. And if it doesn't (or if it only has zeros after the point) then it's an integer. JSON is absolutely unambiguous as to the actual numerical value - how badly that gets translated into the decoding language is entirely on that language.

josephg · on May 22, 2023

This isn't right. JSON can also store exponential numbers (eg {"google": 1e+100}). You could decode this to an arbitrary-sized BigInt, but I can make you waste an arbitrary number of bytes in RAM if you do that. And even then, "look for a decimal point" doesn't give you enough information to tell whether the number is an integer. Eg, 1.1e+100 is an integer, and 1e-100 is not an integer.

One of JSON's biggest benefits is that you don't need to know the shape of the data when you parse. JSON's syntax tells you the type of all of its fields. Unfortunately, that stops being true with numbers as soon as double precision float isn't appropriate. If you use more digits in a JSON number, you can't decode your JSON without knowing what precision you need to decode your data.

Even javascript has this problem if you need BigInts, since there's no obvious or easy way to decode a bigint from JSON without losing precision. In the wild, I've seen bigints awkwardly embedded in a JSON string. Gross.

Putting responsibility for knowing the number precision into the language you're using to decode JSON misses the point. Everywhere else, JSON tells you the type of your data as you decode, without needing a schema. Requiring a schema for numbers is a bad design.

somewhereoutth · on May 22, 2023

Ok so it allows e notation, but still the actual numerical value is unambiguous. You could parse that into a data structure that (for example) stores the mantissa and exponent as (arbitrarily large) integers. Again, that most languages try to shoehorn decimals into floats or whatever is on those languages.

Spivak · on June 4, 2023

In court you would be right, in practice it's on JSON. Requiring an arbitrary precision math library to correctly parse JSON is just not going to happen. The only language I know that even does this out of the box is Python with their automagic best numeric type. Even Ruby which is dynamic to a fault only gives arbitrary precision for integers and parses JSON numbers with decimals as floats.

josephg · on May 21, 2023

True. But the spec also doesn’t provide a way to tell if a stored number should be decoded as a float or an integer - which makes it a right pain to use correctly in most programming languages. I’d love it if json natively supported:

- Separate int / float types

- A binary blob type

- Dates

- Maps with non string keys.

Even javascript supports all this stuff now at a language level; it’s just JSON that hasn’t caught up.

bccdee · on May 21, 2023

Eh, I'm happy with json as being explicitly unicode. Blobs can be base64-encoded. And date parsing invites weird timezone stuff—I'm happy to stick dates in strings and let the programmer handle that. I suppose a good json parser could insist number literals are ints unless you append a ".0", obviating the need for explicit integers, but that feels a bit kludgey. And I agree about the numeric map keys.

__s · on May 21, 2023

Except JSON can't even serialize all js numbers when it comes to NaN or infinity

data-ottawa · on May 21, 2023

You can store the first 2^53 integers with either sign, and if you need accurate integer values beyond that size you can stringify them and parse as big ints.

It’s not ideal, but 2^64 integers is also finite.

mst · on May 21, 2023

53 bit integers should be enough for anyone

(in practice for config it usually is but enforcing it is horribly patchy)

throwaway894345 · on May 21, 2023

> Until you realize you can't actually store real integers because every number in js is a float

JSON !== JS

KronisLV · on May 21, 2023

> JSON is basically perfect if it allowed trailing commas and comments.

I agree, especially in regards to the comments, because sometimes the data itself isn't enough and additional human-readable context can be really useful!

In that regard, JSON5 is a wonderful idea, even if sadly it isn't widespread: https://json5.org/

It also supports the trailing commas and overall just feels like what JSON should be, to make it better without overcomplicating it.

nikeee · on May 21, 2023

I like the idea of JSON5, but it allows a bit too much in my opinion. For example, why add the single quote? Why hexadecimal identifiers?

evntdrvn · on May 21, 2023

You might be interested to check out JSONC and JWCC/HuJSON

https://nigeltao.github.io/blog/2021/json-with-commas-commen... https://github.com/tailscale/hujson

veidr · on May 21, 2023

JSON is trash and should never be used in any human-interfacing context, so I'm super skeptical that there's any utility in trying to fix it; that would just delay its demise, to the detriment of humankind.

But if you did want to fix JSON, the yes, trailing commas and comments are the absolute minimum bar, but single quote is actually probably the the third absolute must-fix.

The reason is just that so much JavaScript tooling is now configured to autoformat code (including the JSON bits) to swap " to ', thanks in large part to Prettier (which also should almost never be used, sigh, but that's a topic for another HN bikeshed...)

hexadecimal identifiers, yeah, nah

ItsHarper · on May 21, 2023

Prettier defaults to double quotes.

I'm curious what you have against Prettier.

chubot · on May 21, 2023

I kinda agree, but I also think then you want unquoted keys

   {name: "value"}

And then maybe 123_456 syntax

It's a slippery slope ... pretty soon it's hard to write a JSON parser, and there are more bugs

The trailing comma one is trivial, I'll grant that

marcosdumay · on May 21, 2023

Trailing commas a very highly impactful to the users, and trivial to the language writers.

The 123_456 syntax comes close to that, but is much less impactful.

None of those will make the language hard to parse.

Spivak · on May 21, 2023

As much as I think it's annoying I think no trailing commas enforces good coding hygiene and basically forces you to use a real encoder rather than as hoc string manipulations.

rand_flip_bit · on May 21, 2023

This is such a bad take

Spivak · on May 21, 2023

Care to like, elaborate? Stopping people from doing brittle stuff like

    print("{")
    for elem in list:
        print('"key": "value",')
    print("}")

seems like immediate worth.

alwaysbeconsing · on May 21, 2023

That's not a very big hurdle to overcome, though:

    print("{")
    contents = ""
    for elem in list:
      contents += '"key":  "value",'
    print(contents[:-1])
    print("}")

marcosdumay · on May 21, 2023

You are talking about data transfer encoding. Configuration languages are a completely different beast.

hellcow · on May 21, 2023

Multi-line strings are another weakness. Sometimes you don’t\nwant\nto\nwrite\nthis\nway.

camgunz · on May 21, 2023

I thought the triple quotes let you avoid that, but I assumed, I haven't checked.

arp242 · on May 21, 2023

Yes, that's correct, but I think the previous poster was talking about JSON.

3np · on May 21, 2023

Are you talking about https://github.com/toml-lang/toml/pull/904 which is merged for 1.1, or something else?

bdhcuidbebe · on May 21, 2023

The poroblem now becomes incompatibility about what a toml document is.

burntsushi · on May 21, 2023

Good thing TOML was never intended to be a JSON replacement.

s3v · on May 21, 2023

Everything is perfect except for the reasons it isn't.

musicale · on May 22, 2023

I kind of like the text property list format from NeXTSTEP, used in GNUstep and (formerly) in macOS.

Apple's XML plist format seems like a mistake, though maybe the newer JSON format is OK.

> JSON is basically perfect if it allowed trailing commas and comments

Apparently Apple actually supports JSON5, an extended JSON based on ES5 that allows trailing commas, comments, and (my favorite!) unquoted key names, among other convenient features.

pydry · on May 21, 2023

I think this was partly what the pytoml author was alluding to when he slated the format:

https://github.com/avakar/pytoml/issues/15#issuecomment-2177...

Datetimes were clearly a mistake to include too. It took the M out of TOML.

ojosilva · on May 22, 2023

I find that YAML can be, if stripped down to the nice parts, a delightful config file structure.

Here's the toml.io example in perfectly legal YAML but using only YAML's nicer, JSON-like, compatibility grammar:

    title: "TOML Example"

    owner: {
        name: "Tom Preston-Werner",
        dob: 1979-05-27 07:32:00 -08:00
    }

    database: {
        enabled: true,
        ports: [8000, 8001, 8002],
        data: [
          ["delta", "phi"],
          [3.14]
        ],
        temp_targets: { cpu: 79.5, case: 72.0 }
    }

    servers: {
        alpha: { ip: "10.0.0.1", role: "frontend" },
        beta: { ip: "10.0.0.2", role: "backend" },
    }

If I were on the YAML board (?) I would push this or a similar subset (JAML? DUML? DUMBL?) to be implemented by parsers in every language: yaml.parse(yamlString, { jamlMode: true }). But it already works today anyway if you stick to the format. And that's what I use for my apps.

Multi-line strings in YAML are also very similar to TOML and you can ignore all different character mixups and stick to '|' (w/ newlines) and '"' (no newlines).

    # same as " hello world! "
    str1: "
        hello
        world!
      "

    # same as "apples\noranges\n"
    str2: |
      apples
      oranges

Most numeric TOML examples work too, minus a few bells and whistles like numeric separators '_':

    # integers
    int1: +99
    int2: 42
    int3: 0
    int4: -17

    # hexadecimal with prefix `0x`
    hex1: 0xDEADBEEF
    hex2: 0xdeadbeef

    # octal with prefix `0o`
    oct1: 0o01234567
    oct2: 0o755


    # fractional
    float1: +1.0
    float2: 3.1415
    float3: -0.01

    # exponent
    float4: 5e+22
    float5: 1e06
    float6: -2E-2

    # both
    float7: 6.626e-34

    # infinity
    infinite1: .inf # positive infinity
    infinite2: +.inf # positive infinity
    infinite3: -.inf # negative infinity

    # not a number
    not1: .nan

bccdee · on May 21, 2023

It kills me just how close JSON was to getting it right. If JSON had been JSON5 [1] instead, YAML and TOML probably wouldn't exist.

[1]: https://json5.org/

atoav · on May 22, 2023

I think JSON syntax is more prone to user syntax errors. And we are talking about syntax errors by the kind of user that neither knows what a "syntax error" nor "JSON" is.

Hence the "O" in "TOML" ("Obvious"). And this is the use case for TOML, simple user facing configuration that they are very likely to just get right.

JSON is fine for more intricate data structures or very complex configuration, but if you just need them to enter a few numbers and booleans it is overkill.

miki123211 · on May 21, 2023

Maybe you could even skip the commas entirely (or threat newlines as commas if possible?) That, along with unquoted keys, would make JSON perfect for me.

In such a hypothetical format, eliminating nulls entirely should Also be considered, the difference between a missing key (undefined) and a null value is significant in JS, but other (particularly statically-typed) languages struggle with differentiating those two cases, and this leads to `serialize(deserialize(a))` representing a different document than `a`.

wewxjfq · on May 21, 2023

The problem with comments in configuration files is that they don't survive a `load -> native object -> save` round-trip.

jen20 · on May 21, 2023

That sounds more like a problem with deserializing configuration instead of parsing.

riffraff · on May 21, 2023

Should a configuration language be a serialization language?

I'm unconvinced.

ilyt · on May 21, 2023

They could if you parsed it into syntax tree wit some methods to access keys instead of parsing into native struct. I think I saw YAML parser doing it...

rwmj · on May 21, 2023

Augeas solves this for many config file formats: http://augeas.net/

ithkuil · on May 21, 2023

I played around with "format preserving" edits of a few text formats, including nested formats (a JSON inside a TOML inside a YAML etc)

https://github.com/mkmik/knot8

peteradio · on May 21, 2023

Comments lie, I think it would make sense to destroy them in load to native.

r3trohack3r · on May 21, 2023

If you parse JSON as YAML you can add comments!

YAML is a proper superset of JSON.

slondr · on May 21, 2023

Not really. https://stackoverflow.com/questions/21584985/what-valid-json...

wkdneidbwf · on May 21, 2023

haha, this exact thing is my biggest gripe with toml

edit: just found https://github.com/toml-lang/toml/discussions/915#discussion..., super happy to be able to rest this out early

rwmj · on May 21, 2023

And if you fixed numbers so you could use 64 bit integers.

eviks · on May 23, 2023

still won't be prefect, all those extra quotes are bad for a heavily used human readable configs format