"Self-describing formats like JSON Lines are big... but when you compress them, ...

turtles_ · on Aug 18, 2021

Indeed, this has already been done: http://ndjson.org/

To be fair it's not an objectionable format. Using line breaks to separate objects makes it streamable, and you don't need to enclose the whole thing in an array to make it a valid JSON document.

jerf · on Aug 18, 2021

That is not quite a CSV replacement. I use it for things with objects and stuff all the time. To be a CSV replacement you really need to add that each line must be a JSON array, and that it can only have scalars in it (no sub-arrays or objects). That would be a decent enough replacement for CSV itself. Not perfect, but the CSV "standard" is already a nightmare at the edge anyhow and honestly a lot of it can't be fixed anyway, so, this is probably as good as it could get.

derefr · on Aug 18, 2021

> that it can only have scalars in it (no sub-arrays or objects)

I see CSV files that contain JSON arrays/objects in their fields all the time. Mainly from exporting Postgres tables that contain json/jsonb-typed columns. Are you saying that these aren't valid CSVs?

zerocrates · on Aug 19, 2021

They're saying that a CSV equivalent should be strictly 2-dimensional, with "flat" values.

Such a format could contain arbitrary JSON in a "cell", but simply as text, in the same way as putting the same data in a CSV.

shaftoe · on Aug 19, 2021

These are strings containing JSON

da_chicken · on Aug 18, 2021

> But this demonstrates the problem with JSON for CSV, I suppose. Is each line an object?

How is that not a problem with every data serialization format? It does me no real good if I have an XML schema and a corresponding file. If I don't know what those elements and attributes represent I'm not really any better off.

It's not like JSON or XML can meaningfully be marshaled back into objects for use generically without knowledge of what is represented. There are generic JSON and XML readers that allow you to parse the data elements sure, but so, too, do generic CSV readers like C#'s CsvHelper or Python's csv. In all cases you have to know what the object turns into in the application before the serialized data is useful.

And, yes, CSV has slightly differing formats, but so does JSON. Date formats are conventionally ISO 8601, but that's not in the spec. That's why Microsoft got away with proprietary date formats in System.Text.Json. XML isn't really any better.

cpx86 · on Aug 18, 2021

> That's why Microsoft got away with proprietary date formats in System.Text.Json.

What's proprietary in it? It follows ISO 8601-1:2019 and RFC 3339 according to the docs.

da_chicken · on Aug 18, 2021

Sorry, that should be System.Runtime.Serialization.Json. System.Text.Json is the newer class that replaced it.

In .Net Framework 4.6 and earlier, the only built-in JSON serializer in the .Net Framework was System.Runtime.Serialization.Json.DataContractJsonSerializer.

You can still see it. If you're on Windows 10, run Windows Powershell v5.1 and run:

  Get-Item C:\Windows\System32\notepad.exe | Select-Object -Property Name, LastWriteTime | ConvertTo-Json

You'll see this output:

  {
    "Name":  "notepad.exe",
    "LastWriteTime":  "\/Date(1626957326200)\/"
  }

Microsoft didn't fix their weird JSON serialization until quite late. They may have back ported it to the .Net Framework, but they've deleted that documentation. Powershell v6 and v7 include the newer classes that are properly behaved. This is why Json.NET used to be so popular and ubiquitous for C# and ASP applications. It generated JSON like most web applications do, not the way Microsoft's wonky class did. Indeed, I believe it may be what System.Text.Json is based on.

cpx86 · on Aug 19, 2021

Oh that one - yeah I've always steered clear of DataContractJsonSerializer. Never understood why they did it so weird.

To be fair, RFC 3339 wasn't even published back when this class was implemented (in .NET 3.5) so I guess they just went with whatever worked for their needs. ¯\_(ツ)_/¯

da_chicken · on Aug 19, 2021

I'd be quicker to believe that it's because 2007 was still in the middle of Steve Ballmer's Microsoft, where embrace-extend-extinguish was their de jure practice.

a9h74j · on Aug 19, 2021

I have wondered about a file format where a parser could be specified for [at the start of] each line. You could even have different json parsers with different well-characterized limits and relative speeds. Formats could change over time on a line-by-line basis, without being locked into a full-file IDL or similar.

cdcarter · on Aug 18, 2021

JSON Lines is a specified format that answers those questions. https://jsonlines.org/ Seems like it qualifies to the level of authority you're requiring.