Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Select, put and delete data from JSON, TOML, YAML, XML and CSV files (github.com/tomwright)
200 points by wyuenho on March 7, 2022 | hide | past | favorite | 31 comments


Reminds me of https://github.com/johnkerl/miller which is also a go based tool cloning features from tools like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON


I'm a big fan of miller (mlr) -- it's the tool I landed on when I needed to "graduate" from awk to look at CSV data. But when I read "go based" in your comment, I thought "nope, it's written in C". But no! It was ported to go -- very interesting!

The developer wrote a comprehensive document explaining the rationale behind the porting that answered all my questions and a lot more: https://github.com/johnkerl/miller/blob/main/README-go-port.....

Thought other miller/mlr fans (that don't follow its development) might find this interesting as well.

(The dasel tool looks very cool, too -- looks like a good complement to mlr and similar tools!)


It is nice, but I wonder if it's also a full-fledged query compiler?

Also IMHO modules with user-defined (higher order?) functions are a big plus of a query language.


I think I prefer yq's[1] syntax for manipulating and updating fields.

Combined with yj[2] you can also get TOML and HCL support, but not CSV.

It's certainly not because there's a lack of these tools :p

[1]: https://github.com/mikefarah/yq

[2]: https://github.com/sclevine/yj


I remember looking through XPath and wondering why no one had converted as much of it as possible to a generic path language for any tree-based data format, which JSON, TOML, YAML, and XML are.

XPath was the best of the XML standards. Well, it helps that the language wasn't xml unlike XSLT and others.


Regarding XQuery we just added JSON querying on top in Brackit[1] / SirixDB[2].

Brackit is a retargetable query compiler and does a lot of optimizations at compile time as for instance optimizing joins and aggregations. It is useable as an in-memory processor or as a query processor of a database system.

The Ph.D. thesis of Sebastian:

Separating Key Concerns in Query Processing - Set Orientation, Physical Data Independence, and Parallelism

http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publicatio...

[1] http://brackit.io

[2] https://sirix.io


XPath was a great system. However I feel like most of the features required existing XSLTs to support them.


Can this or anything else flatten arrays of JSON objects to CSV?


I tend to use python (usually in a jupyter notebook) and pandas. Lots of experimentation with pandas' json_normalize() function. https://towardsdatascience.com/all-pandas-json-normalize-you...

then just call to_csv() on the dataframe. (edited to add to comment on exporting to CSV as per the original question).


I also use jupyter for this, but my goto is tablib. for anyone who hasn't used it, it's super easy to switch between tabular data formats. you create an instance of their Dataset class, then assign your data to the appropriate property, and all of the other properties are your data in the respective format

for instance:

  from tablib import Dataset

  json_array_of_objects = '[{"header": "data1"}, {"header": "data2"}]'
  ds = Dataset()
  ds.json = json_array_of_objects
  
  ds.csv # data formatted as a csv
  ds.xlsx # excel, only useful on a binary read or write
  ds.dict # list of dictionaries
  ds.json # list of dictionaries converted to json
  ds.jira # table formatted for jiras markup
  ds.html # html table
  # and more
they used to vendorize dependencies, so everything worked out of the box, but now some features need to be installed specifically, or do pip install tablib[all], which is kind of annoying. I suspect they started doing it when they included support for pandas dataframes, because they didn't want to vendorize all of pandas. or force it to install as a requirement.


In addition to the already mentioned jq, there's https://github.com/jehiah/json2csv


Try this: https://flatterer.opendata.coop/

There is no binary yet but there is a python CLI and library, even though it is written in rust.

It is the only tool that I know that deals with nested JSON and converts it into relational tables.

Here is a notebook of the python library usage.

https://deepnote.com/@david-raznick/Flatterer-Demo-FWeGccp_Q...



I mostly go from csv to csv but nushell handles table data in files really well [1]. I've done a little but of json manipulation with it but not a ton, so I can't say it's a silver bullet.

1. https://www.nushell.sh/book/


Here's a Python library with a CLI:

Pypi: https://pypi.org/project/flattentool/

Docs: https://flatten-tool.readthedocs.io/en/latest/

It's maintained by Open Data Services Coop, where we use it as a component in several of our web & data pipeline tools for working with data that is published in a Data Standard.


https://github.com/stedolan/jq can manipulate arrays of objects and format as CSV


Of course you can do it with http://brackit.io.

What exactly is the "transformation" you envision?


Something like this (very simplified):

  let $array := [{"foo":0,"bar":"tztz"},{"foo":"hello","bar":null},{"foo":true,"bar":"yes"}]
  let $value := for $object in $array
                return
                  let $fields := bit:fields($object)
                  let $len := bit:len($fields)
  
                  for $field at $pos in $fields
                  return if ($pos < $len) then (
                    $object=>$field || ","
                  ) else (
                    $object=>$field || "\n"
                  )
  return string-join($value,"")
will output:

  0,tztz
  hello,null
  true,yes


You could do something like this in pure python without the json loading boilerplate with jello[0]. An interactive TUI for jello called jellex[1} is also available. (I am the author)

[0] https://github.com/kellyjonbrazil/jello

[1] https://github.com/kellyjonbrazil/jellex


JSON loading boilerplate? Is it also a compiler with sophisticated optimizations during compile time? :)


Of course you can simply create a module with something like the above snippet as a function for reuse. Or I can add a built-in function for flattening like this or as in pandas.


If you use JavaScript-> papaparse or like other have pointed out straight forward to do in python, bash, Perl, etc…


Excel can do this.


jq


The "delete" example on the readme is a bit odd, since it seems to suggest that you need to specify the value to delete as well as the path. The equivalent in the documentation does not have this problem.


Looks to be a mistake in the readme. The example in the docs doesn't have value being passed in.

> echo '{"name": "Tom"}' | dasel -p json '.name'

https://daseldocs.tomwright.me/usage/delete


Can confirm this was a mistake I made when mass-compiling the docs


That's a fascinating concept. Is there anything similar for INI? Such a tool for ini could help in automatically changing configuration files for software which is most often configured by ini files.


I haven't used it but augeas seems to have support for ini files [1].

1: https://augeas.net/docs/references/1.4.0/lenses/files/inifil...


Not a generic tool as such, but Winter CMS has a INI parser / renderer built into the core: https://wintercms.com/docs/services/parser#ini-parser


Hey, author of dasel here. I have an issue open to add INI support - just haven't gotten round to it yet. Life has been busy!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: