Hacker News new | past | comments | ask | show | jobs | submit login

BTW you can see a version of what an industrial strength query optimizer / execution engine looks like in Rust https://arrow.apache.org/datafusion/

(can also use it in your own projects)

It is quite similar to what is described in this post




Somewhat similar, see https://substrait.io/

So for example using DuckDB with the Substrait extension, if you create a table

    create table t(a int);
and then query it as in the article, you can see something similar to what is described in the article

    CALL get_substrait_json('select * from t');

    {"relations":[{"root":{"input":{"project":{"input":{"read":{"baseSchema":{"names":["a"],"struct":{"types":[{"i32":{"nullability":"NULLABILITY_NULLABLE"}}],...
DuckDB extension doesn't seem to cover any DDL operations though.

https://duckdb.org/docs/extensions/substrait

Some other related discussions and links that i've collected over the years

https://news.ycombinator.com/item?id=37415494

https://news.ycombinator.com/item?id=34233697

https://news.ycombinator.com/item?id=31981568

https://datastation.multiprocess.io/blog/2022-04-11-sql-pars...

https://tomassetti.me/parsing-sql/


After a quick look, I'm not sure if I would call this “industrial strength”. In particular, the join optimizer (typically the heart of a large-scale SQL optimizer) looks very rudimentary? And the statistics it uses have zero idea about correlation, no histograms beyond min/max…


I was wondering about the same claim. However, I believe that JOIN's are a common weakness among OLAP database engines, and DataFusion is built on top of a columnar storage format - Apache Arrow.


By being columnar, I guess you could say DataFusion has a good executor, but no, not a good optimizer.


Not that I was trying to make any of those claims but just trying to correlate the domain with what appears to be a common problem in it.


There's also this pretty detailed article on StarRocks' query optimizer. (StarRocks is open source - focused on OLAP)

https://medium.com/starrocks-engineering/starrocks-inside-sc...





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: