Hacker News new | past | comments | ask | show | jobs | submit login
The I-JSON Message Format (tbray.org)
132 points by pjvds on March 23, 2015 | hide | past | favorite | 45 comments



Lack of a native date/time format in JSON has been the biggest pain point for me. Most (all?) serializers convert Dates to an ISO string but that almost always requires manual conversion back to a Date during deserialization before doing anything useful with it.

Using epoch for dates makes simple math & before/after comparison easier but requires explicit conversion during serialization.

Unfortunately from what I can tell, I-JSON doesn't appear to solve this problem (or does it??) One nice thing about BSON is they made Date types first class citizens of the format.


Dunno, I always liked storing date/time as epoch. Every language under the sun seems to have a native method for working with it. Yeah, I need to deal with de/serialization, but it is a small price to pay, no?


Epochs don't tell you what time zone you're working with and they aren't very easy to read/debug at a glance.


Fair enough @pimlottc. But most of the time I am far more concerned with accurately capturing a moment in time than I am with making it instantly readable. I also have helped some companies that ran into serious datetime management issues when they worked with strings that led engineers to assume a certain timezone, others to assume others, and chaos ensued.

Epochs may not be instantly readable, but they do force everyone onto the same page.

In any case, I tend to view a timezone as a separate piece of data than the actual moment in time (but I know others have a different paradigm): datetime = accurate moment in time; timezone = the timezone for which this should be viewed, or was captured or, etc.


So by epoch, obviously you mean the number of seconds since Jan 1 1970. Midnight - 00:00, right? Ah... but then... UTC? or TAI? There's a 35 second difference, after all. There's a school of thought that the UNIX epoch counts from 1970-01-01 00:00:10 TAI...


POSIX specifies the Epoch to be UTC and has since at least 2001. People may have other opinions on how it should be specified, but if you're going to follow POSIX as it exists, you're not left with a choice in the matter.

http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_...


But POSIX also claims there are 86400 seconds in a day, which is not always true for UTC. There are two ways of dealing with that - POSIX says that the correct way is to just count seconds, but reset every UTC midnight to (number of days since 1970-1-1) * 86400, which means that when leap seconds occur some epoch numbers are ambiguous (or in a leap-second-deletion, are skipped). NTP ignores POSIX and says that the way to deal with this is to vary the length of a second during a day which contains a leap second.

And we're talking about JSON here, so isn't the ECMA-262 standard for dates more relevant than the POSIX standard? ECMAScript has some very fuzzy ideas about dates.


POSIX specifies that there are 86400 seconds in a day. POSIX is not making claims about reality, it is specifying its own reality. That's what standards do.

ECMA-262 isn't really relevant at all, since it's not the (or even an) authority on JSON. JSON was simply derived from it -- in an incompatible way at that. It's doubly irrelevant since you were talking about Unix, so that's what I was addressing.

By the way, your phrasing is odd/confusing. You're talking about "epochs" in a strange way. In Unix/POSIX land, there is one epoch, "The time zero hours, zero minutes, zero seconds, on January 1, 1970 Coordinated Universal Time (UTC).". Unix timestamps are derived from the epoch, they do not define it.


I had not thought of that. Ha!

I would go for UTC, since that is what 99.99% of the people think of (and so few even know about TAI, except for the smart ones like @jameshart! :-) )


Another annoyance is seconds since epoch (traditional Unix) vs milliseconds since epoch (e.g. Java).


And JavaScript is milliseconds as well. I think either would be fine, as long as it is an agreed standard. Personally, I use ms, because most programming langs can instantly convert with any additional math. But it probably is wasteful. Then again, 1000 is just a few bits on each...


> I-JSON doesn't appear to solve this problem

It's not within the scope of what I-JSON is intended to address. This is a more formally specified and slightly more constrained variation of the existing JSON spec. Adding new datatypes would mean that it would no longer be JSON.


I'll agree there aren't many ways this could be solved without breaking compatibility with JSON. Certainly something like {"created": Date("2013-12-1T12:00:00Z")} a la BSON seems elegant but is incompatible with JSON.

Maybe something like "String values that match (some computationally inexpensive ISO-8601-matching regex) shall be converted to Date instances by the JSON parser" could be possible without huge compatibility issues.

I guess I was mostly using this topic to voice what I'd imagine is a common point of frustration in an otherwise great data interchange format.


No, definitely don't do that. You're just trading one set of small annoyances for a huge set of unpredictable problems.


One might wish for a strict requirement for date formats for better interoperability, but at least I-JSON recommends something: ISO 8601 [per RFC3339] (with additional restrictions - see section 4.3 which I've quoted elsewhere on this page).



I love it. When I first started in IT (1994, don't ask...) working with London and Tokyo and San Fran and Singapore and everyone wrote dates differently. I just started writing YYYY-MM-DD everywhere, and all of the questions went away.


Looks like ISO-8601 to me! We've standardized on using this format (extended to include time where necessary) whenever our JSON objects include a date or date/time. We've also standardized on UTC. Since our system clocks are already synchronized that way, it's easy for us and we simply i18n/l12n them on entry and/or display.


It was, I just didn't know it at the time! I was simply looking for some way to write emails and spec docs in a way that everyone would have a common frame of reference with zero extra work.

Yeah, in storage (as above), I use epoch all the way through and convert as needed. But as the thread above shows, not everyone likes this path...


Yes, conformance to a single string representation of a datetime is nice. Unfortunately that still doesn't help the issue when deserializing, you're just going to get a string that needs to be manually converted to a Date object.


See also RFC 7049 "Concise Binary Object Representation" (CBOR), a 'binary JSON' http://tools.ietf.org/html/rfc7049 http://cbor.io/

Faster, smaller, pretty sure it will parse on the other end.


The fun part is that this RFC was edited by Carsten Bohrmann.


very cool - I haven't heard of this before. It looks like it resolves a lot of the issues I have with JSON.


It's great to see this! We've been bitten by JSON allowing arbitrarily large numbers at Snowplow. This is what a JSON Schema trying to enforce sensible (int64) numeric limits ends up looking like: https://github.com/snowplow/iglu-central/blob/master/schemas...


I'm surprised to see nothing about formatting dates, which in my experience is the worst interop issue with JSON. I'm glad people are working on this though---thank you!


Dates and times are covered:

   4.3.  Time and Date Handling

   Protocols often contain data items that are designed to contain
   timestamps or time durations.  It is RECOMMENDED that all such data
   items be expressed as string values in ISO 8601 format, as specified
   in [RFC3339], with the additional restrictions that uppercase rather
   than lowercase letters be used, that the timezone be included not
   defaulted, and that optional trailing seconds be included even when
   their value is "00".  It is also RECOMMENDED that all data items
   containing time durations conform to the "duration" production in
   Appendix A of RFC 3339, with the same additional restrictions.


For what it's worth, Salesforce even conforms to ISO 8601 internally. So, you know, if you're trying to develop agile tools for translating between JSON and Salesforce API records, then, um, things should just work...

takes a brief moment to contemplate the reality of his existence

The flip side of this is that if you're not using ISO 8601 in your JSON, you're doing worse than Salesforce. That's about as good of an incentive I can give this community to try to standardize their organizations around this standard!


I'm really surprised there's still a lot of people out there that don't use ISO 8601, I haven't used anything but for nearly a decade :/


Twilio was the most shocking example. They use that braindead[1] "RFC" style that includes day of week and month name in English. I remember feeling it looked like the most ugly part of their API; no reason for this silliness. (Maybe they've changed it since I looked a few years ago).

1: It made sense in like the early 70s when people read and wrote all headers by hand. It's been stupid for a much longer time than it made sense though.


Oh that's great! It looks like the HTML version of his link doesn't have any subsections under section 4:

http://rfc7159.net/rfc7159

but the plaintext version does:

http://www.rfc-editor.org/rfc/rfc7493.txt

Maybe the document is still in flux and the text version represents a later addition?

EDIT: Oops, my first link is to the original JSON spec. Sorry for the confusion!


but its only "recommended" which means it can not adhere to this, and still be compliant. It seems a bit pointless to make a new spec and not enforce one of the main things that trip people up.


If the author of this RFC is still doing edits, I think it might be worth mentioning in section 4.2 "Must-Ignore Policy" something along the lines of,

    An I-JSON implementation supporting a "Must-Ignore" policy SHOULD pass any such new protocol elements on, untouched, to any downstream consumers of the message, because those downstream consumers may understand the new elements.
It is another one of those things that are obvious to many people, but I could imagine somebody reading the last sentence of the section, "members whose names are unrecognized MUST be ignored", and thinking that they should omit the unrecognized elements before passing the message on.


For a lot of the tips (don't repeat keys, use ISO8601 dates), most JSON libraries should follow these conventions. So as long as you aren't hand-generating your JSON, you'd need to do very little work to conform to these guidelines.


More specifically, RFC3339 dates, which are ISO8601 with a lot of extraneous cruft (week of year? really?) removed.


Week numbers are a very common way of talking about time in some locales.

For instance in my native Sweden, it's very common when e.g. scheduling things at work to talk about which week something will happen.

Sites like http://vecka.nu make it easy to check the current week number (the domain name translates as "week.now") whenever you have internet access, paper calendars include it, and so on. It's really very common.


I'd love to see someone fork https://github.com/arc90/jsonlintdotcom and add an option for I-JSON validation.

[edit] Actually, it looks like the guts of this tool live here: https://github.com/zaach/jsonlint


Kind of a bummer that nothing was suggested wrt tightening the number grammar. "99" and "99e0" are both valid serializations of 99, and "0e7" and "0" of zero. JSON also allows arbitrary trailing and leading 0s in the fraction and exponent, respectively. Personally I don't know why for instance "0.0" can't be the only valid interpretation of a double 0 and "0" be the only valid interpretation of an integer 0. In fact, why can't all floats that are ambiguous wrt an integer interpretation have a ".0" tail?

Also, if you're going to stash binary data in to quoted JSON strings then why base64url encoding and not Z85[0]? It's more efficient and easier to decode.

Using schema-less formats generally sucks.

[0] http://rfc.zeromq.org/spec:32


"In fact, why can't all floats that are ambiguous wrt an integer interpretation have a ".0" tail?"

In JavaScript Object Notation, everything's a float. There are no integers, just floats that happen to not have any fractional elements.

Numbers are the weakest part of the JSON spec in general because Javascript has very weak numbers. A spec merely intending to tidy up certain questionable corners hasn't really got license to "fix" the numbers problem.

(And just to make it clear one more time, I fully agree that the numbers are really problematic in JSON... I just don't think that problem can be fixed here. It would take a JSON 2.)


Schemas are anything but a free lunch.

It has been said that the limit of the ratio of (creating, groking, interacting with) the hundreds to thousands of XML schemas/documents that invariably creep up in the average large-scale-XML-using enterprise environment versus (doing actual work) increases without bound as t goes to infinity.


To me, I know of Tim Bray because of XML.


He also authored one of the two competing standards for JSON: http://tools.ietf.org/html/rfc7159


They're really not competing at this point. 7159 is the one spec to rule them all. ECMA-404 (which was kind of silly to start with[0]) isn't really relevant to most developers anymore.

ECMA-262 even explicitly uses 7159's predecessor (4627) with two exceptions[1], one of which is the top-level compatibility headache 7159 fixed, and the other just requires the API to disregard the "MAY" in section 4.

[0] https://www.tbray.org/ongoing/When/201x/2014/03/05/RFC7159-J...

[1] http://www.ecma-international.org/ecma-262/5.1/#sec-15.12


I've run into the date formatting issue that others are mentioning, and just this weekend I discovered the hard way that the V8 JSON parser does not correctly parse the following hierarchy:

object --> array --> object --> array

I must have spent a few hours trying to figure out why Angular wasn't iterating through the first array, only to discover that it was being parsed like so:

object --> object --> object --> object


That does not seem plausible. That would break a million different things. Are you sure you're interpreting your results correctly? Do you have a test case you can share?


I couldn't repo this. What am I missing?

http://jsfiddle.net/greggman/bkfxefgL/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: