Actually this is quite a bit different. This is more a columnar-store version of SQLite, basically an embedded OLAP database, which is pretty cool. I’m not aware of other column-stores in this niche, most are distributed systems meant for big data and so are much more complicated to setup and manage.
since llvm is so slow you really have to validate if jit does work for your queries (obviously like for anything... duh) in my case i managed to slow the DB to a crawl with queries that were estimated to be super expensive but 95% of the plan wasn't actually ever executed.
Yea, we really need to improve the handling of those cases. I think there's four (was three) major angles:
1) I'd hoped to get caching for JITed queries into 13 (or at least the major prerequisite), but that looks like it might miss the mark (job changes are disruptive, even if they end up allowing for more development time). The nicest bit is that that the necessary changes also result in significantly better generated code.
2) Background JIT compilation. Right now the JIT compilation happens in the foreground. We really ought to only do the IR generation in foreground, and then do the compilation in the background, while continuing with interpreted execution. Only once codegen is done, we'd redirect to the JITed program (there'd be a bit higher overhead during the interpreted phase, rechecking whether to now redirect, but not that large).
3) Improve costing logic. E.g. we don't take the size of the necessary generated code into account at the moment, and we should. The worker count isn't taken into account either.
4) Improve optimization pipeline. There's plenty cases where we don't run beneficial and cheap-ish optimization passes, and there's plenty cases where we run unlikely to be helpful and really expensive optimization passes.