> I don't understand why SQL on LLVM? SQL is mostly "do the same thing a billion...

> I don't understand why SQL on LLVM?

SQL is mostly "do the same thing a billion times", which aligns very closely with generating code for the inner loop, particularly when the inner loop can be generated without branches.

SQL codegen is basically an industry standard for large-scale sql query processing on top of a large-scale column store. I work one of the few engines at this scale which does not do go query stage codegen & instead uses source codegen vectorized primitives instead of query time (Apache Hive, specifically & the only other one in the same style is Google's supersonic engine).

Apache Impala - SQL to LLVM IR (C++)

https://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloud... [PDF]

Gandiva - SQL to LLVM IR (Java)

https://github.com/dremio/gandiva

MemSQL - SQL to MPL to LLVM IR

http://highscalability.com/blog/2016/9/7/code-generation-the...

Greenplum - SQL to LLVM IR

http://engineering.pivotal.io/post/codegen-gpdb-qx/

Postgres 9.4 Vitesse - SQL to LLVM IR

<https://www.postgresql.org/message-id/CAJNt7%3DZ6w5%2BwyeTKK...

Postgres 11 LLVM - SQL to LLVM IR

https://www.postgresql.org/docs/11/jit-reason.html

SparkSQL - SQL to Java source (Janino + javac)

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark...

And as far as I know Redshift's horrible query compilation performance is derived from using C++ codegen which forks gcc inside (not sure why this is so slow though, almost feels like untuned gcc flags or something).

https://docs.aws.amazon.com/redshift/latest/dg/c-query-plann...