Hacker News new | past | comments | ask | show | jobs | submit login

How is it better that SQL for these tasks on tabular data?

That's the first question I have after reading the title. Haven't read the article.

Edited the first sentence. Originally it was "Why it is not called SQL if works on tabular data?".

My point, if one wants cat, sort, sed, join on tabular data, SOL is exactly that. Awk is too powerful, not sure about it.




>How is it better that SQL for these tasks on tabular data?

It has far better handling of a CSV/TSV file on the command line directly and is compasable in shell pipelines.

>My point, if one wants cat, sort, sed, join on tabular data, SOL is exactly that.

SQL is a language for data in the form of tables in relational databases.

While it can do sorting or joining or some changes, it is meant for a different domain than these tools, which other constraints, other concerns, and other patterns of use...

You don't need to load anything to a db, for starters.

You also normally don't care for involving a DB in order to use in a shell script, or for quick shell exploration.

You also can't mix SQL and regular unix userland in a pipeline (well, with enough effort you can, but it's not something people do or need to do).


DuckDB is actually pretty good at this kind of thing.

Doesn't need to load anything to DB

Can be used in shell

Can read from stdin and write to stdout


csvsql is the first google result for "sql in command line for csv"

https://towardsdatascience.com/analyze-csvs-with-sql-in-comm...


Yes, I know. It's not that there isn't several ways to do it, it's that it's not really a good fit for the command line, except in the "I want to reuse SQL that I already know".

The problem isn't in having a way to use SQL to query the data from the command line, it's that SQL is long winded and with syntax not really fit in a traditional shell pipeline.



1. Because it doesn't use SQL syntax.

2. Because it's closer to an amalgamation of the standard shell scripting tools (cut, sort, jq, etc) than it is to a SQL variant.


and miller also has a powerful (and intuitive imo.) dsl should you need to go beyond what the simple command line switches offer.

so, it remains simple and concise for the easy problems, yet can scale up to address the trickier ones as well.

i've used csvkit, sqlite, jq, etc. miller is my favorite tool for data-fu (with sqlite being a close 2nd)


Because SQL (Structured Query Language) is the language used to access/manipulate the data, not the name for that kind of data. There have been, and are, many databases, which essentially use tabular data, which are not SQL database.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: