SQL is mostly "do the same thing a billion times", which aligns very closely with generating code for the inner loop, particularly when the inner loop can be generated without branches.
SQL codegen is basically an industry standard for large-scale sql query processing on top of a large-scale column store. I work one of the few engines at this scale which does not do go query stage codegen & instead uses source codegen vectorized primitives instead of query time (Apache Hive, specifically & the only other one in the same style is Google's supersonic engine).
And as far as I know Redshift's horrible query compilation performance is derived from using C++ codegen which forks gcc inside (not sure why this is so slow though, almost feels like untuned gcc flags or something).
SQL is mostly "do the same thing a billion times", which aligns very closely with generating code for the inner loop, particularly when the inner loop can be generated without branches.
SQL codegen is basically an industry standard for large-scale sql query processing on top of a large-scale column store. I work one of the few engines at this scale which does not do go query stage codegen & instead uses source codegen vectorized primitives instead of query time (Apache Hive, specifically & the only other one in the same style is Google's supersonic engine).
Apache Impala - SQL to LLVM IR (C++)
https://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloud... [PDF]
Gandiva - SQL to LLVM IR (Java)
https://github.com/dremio/gandiva
MemSQL - SQL to MPL to LLVM IR
http://highscalability.com/blog/2016/9/7/code-generation-the...
Greenplum - SQL to LLVM IR
http://engineering.pivotal.io/post/codegen-gpdb-qx/
Postgres 9.4 Vitesse - SQL to LLVM IR
<https://www.postgresql.org/message-id/CAJNt7%3DZ6w5%2BwyeTKK...
Postgres 11 LLVM - SQL to LLVM IR
https://www.postgresql.org/docs/11/jit-reason.html
SparkSQL - SQL to Java source (Janino + javac)
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark...
And as far as I know Redshift's horrible query compilation performance is derived from using C++ codegen which forks gcc inside (not sure why this is so slow though, almost feels like untuned gcc flags or something).
https://docs.aws.amazon.com/redshift/latest/dg/c-query-plann...