Reminds me of https://github.com/johnkerl/miller which is also a go based tool cloning features from tools like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
I'm a big fan of miller (mlr) -- it's the tool I landed on when I needed to "graduate" from awk to look at CSV data. But when I read "go based" in your comment, I thought "nope, it's written in C". But no! It was ported to go -- very interesting!
I remember looking through XPath and wondering why no one had converted as much of it as possible to a generic path language for any tree-based data format, which JSON, TOML, YAML, and XML are.
XPath was the best of the XML standards. Well, it helps that the language wasn't xml unlike XSLT and others.
Regarding XQuery we just added JSON querying on top in Brackit[1] / SirixDB[2].
Brackit is a retargetable query compiler and does a lot of optimizations at compile time as for instance optimizing joins and aggregations. It is useable as an in-memory processor or as a query processor of a database system.
The Ph.D. thesis of Sebastian:
Separating Key Concerns in Query Processing - Set Orientation, Physical Data Independence, and Parallelism
I also use jupyter for this, but my goto is tablib. for anyone who hasn't used it, it's super easy to switch between tabular data formats. you create an instance of their Dataset class, then assign your data to the appropriate property, and all of the other properties are your data in the respective format
for instance:
from tablib import Dataset
json_array_of_objects = '[{"header": "data1"}, {"header": "data2"}]'
ds = Dataset()
ds.json = json_array_of_objects
ds.csv # data formatted as a csv
ds.xlsx # excel, only useful on a binary read or write
ds.dict # list of dictionaries
ds.json # list of dictionaries converted to json
ds.jira # table formatted for jiras markup
ds.html # html table
# and more
they used to vendorize dependencies, so everything worked out of the box, but now some features need to be installed specifically, or do pip install tablib[all], which is kind of annoying. I suspect they started doing it when they included support for pandas dataframes, because they didn't want to vendorize all of pandas. or force it to install as a requirement.
I mostly go from csv to csv but nushell handles table data in files really well [1]. I've done a little but of json manipulation with it but not a ton, so I can't say it's a silver bullet.
It's maintained by Open Data Services Coop, where we use it as a component in several of our web & data pipeline tools for working with data that is published in a Data Standard.
let $array := [{"foo":0,"bar":"tztz"},{"foo":"hello","bar":null},{"foo":true,"bar":"yes"}]
let $value := for $object in $array
return
let $fields := bit:fields($object)
let $len := bit:len($fields)
for $field at $pos in $fields
return if ($pos < $len) then (
$object=>$field || ","
) else (
$object=>$field || "\n"
)
return string-join($value,"")
You could do something like this in pure python without the json loading boilerplate with jello[0]. An interactive TUI for jello called jellex[1} is also available. (I am the author)
Of course you can simply create a module with something like the above snippet as a function for reuse. Or I can add a built-in function for flattening like this or as in pandas.
The "delete" example on the readme is a bit odd, since it seems to suggest that you need to specify the value to delete as well as the path. The equivalent in the documentation does not have this problem.
That's a fascinating concept. Is there anything similar for INI? Such a tool for ini could help in automatically changing configuration files for software which is most often configured by ini files.