Hacker News new | past | comments | ask | show | jobs | submit login

I recommend using clickhouse-local[1] for these tasks.

It does SQL; it supports all imaginable data formats, streaming processing, and connecting to external data sources. It also outperforms every other tool[2].

[1] https://clickhouse.com/blog/extracting-converting-querying-l...

[2] https://colab.research.google.com/github/dcmoura/spyql/blob/...




> curl https://clickhouse.com/ | sh

Jesus, this is disgusting. I'm not that picky and don't really complain about "... | sh" usually, but at least I took it for granted that I can always look at the script in the browser and assume that is has no actual evil intentions and doesn't rely on some fucking client-header magic to be modified on the fly.


Why be so cantankerous?

    curl https://clickhouse.com/ > install.sh
    cat install


User agent "sniffing" (I mean, it's right there. It's not exactly subtle.) has been going on since the before the IE6 days. That it now extends to make things easier for us as command line users is... kinda convenient? Another site where I've seen it done to good effect is http://ifconfig.me . Hit that with a web browser and get a page of accompanying information. Hit it with curl, and get back your ip in ascii - not even an extra newline character is returned!

The underlying question is do you trust clickhouse.com or not? You don't have to; I've never met the team or talked to them, and I can't make that decision for you. But whether you go to the site, laboriously find the download page, right click, download a binary, install the deb/rpm, ask your package manager for what files just got installed, then find and run the clickhouse binary, or just let your computer do it for you via a shell script, the end result is the same. Code from clickhouse (and we're sure it was from clickhouse because of TLS) was downloaded to the target machine, and then got run. Does things have to be difficult and annoying in order for you to like it? (Psychology studies say yes, actually.)


We provide .deb, .rpm, .tgz, Docker, or single-binary, for x86-64, AArch64, for Linux, Mac and FreeBSD.



pipe it though less


or sqlite for CSV/TSV, haven't tried it for json.


duckdb and being able to write:

   select a,b,c from '*.jsonl.gz'
has been a huge improvement to my workflows.


How does it look from command-line for streaming processing of CSV/TSV?


you can pipe to it


Yep:

  $ cat foo.tsv

  name    foo     bar
  Alice   10      8888
  Bob     20      9999

  $ cat foo.tsv | sqlite3 -batch \
    -cmd ".mode tabs" \
    -cmd ".import /dev/stdin x" \
    -cmd "select foo from x where bar > 9000;"

  20




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: