How is it better that SQL for these tasks on tabular data? That's the first ques...

coldtea · on March 16, 2023

>How is it better that SQL for these tasks on tabular data?

It has far better handling of a CSV/TSV file on the command line directly and is compasable in shell pipelines.

>My point, if one wants cat, sort, sed, join on tabular data, SOL is exactly that.

SQL is a language for data in the form of tables in relational databases.

While it can do sorting or joining or some changes, it is meant for a different domain than these tools, which other constraints, other concerns, and other patterns of use...

You don't need to load anything to a db, for starters.

You also normally don't care for involving a DB in order to use in a shell script, or for quick shell exploration.

You also can't mix SQL and regular unix userland in a pipeline (well, with enough effort you can, but it's not something people do or need to do).

holy_diver · on March 16, 2023

DuckDB is actually pretty good at this kind of thing.

Doesn't need to load anything to DB

Can be used in shell

Can read from stdin and write to stdout

avodonosov · on March 16, 2023

csvsql is the first google result for "sql in command line for csv"

https://towardsdatascience.com/analyze-csvs-with-sql-in-comm...

coldtea · on March 16, 2023

Yes, I know. It's not that there isn't several ways to do it, it's that it's not really a good fit for the command line, except in the "I want to reuse SQL that I already know".

The problem isn't in having a way to use SQL to query the data from the command line, it's that SQL is long winded and with syntax not really fit in a traditional shell pipeline.

avodonosov · on March 16, 2023

I submitted a ticket: https://github.com/johnkerl/miller/issues/1235

garciasn · on March 16, 2023

1. Because it doesn't use SQL syntax.

2. Because it's closer to an amalgamation of the standard shell scripting tools (cut, sort, jq, etc) than it is to a SQL variant.

kbouck · on March 17, 2023

and miller also has a powerful (and intuitive imo.) dsl should you need to go beyond what the simple command line switches offer.

so, it remains simple and concise for the easy problems, yet can scale up to address the trickier ones as well.

i've used csvkit, sqlite, jq, etc. miller is my favorite tool for data-fu (with sqlite being a close 2nd)

LMMojo · on March 16, 2023

Because SQL (Structured Query Language) is the language used to access/manipulate the data, not the name for that kind of data. There have been, and are, many databases, which essentially use tabular data, which are not SQL database.