> I'm saying SQL is a bad API. You can probably make some api language that's mo...

crimsonalucard · on Sept 4, 2019

You can enforce patterns via types. You can also have default optimizations. The problem with SQL is to choose optimization you have to do it by poking the syntax.

For example:

   query1 = select(table, columns, optimization_method="auto")

   query3 = join(query1, query2, joinOptimizationMethod="auto"...)

Obviously the above example is trivial, wrong and probably won't work. But I hope it paints a fuzzy picture of my point. That is that the optimization or 'plan' should be explicit. You choose auto or you dive in and you change it yourself. No language or syntax hacks.

james_s_tayler · on Sept 4, 2019

No thanks. The benefit is not having to specify the plan. The optimizers themselves are fairly good. A well designed table with the right indexes should perform well.

Raw SQL is great at what it does.

ORMs are typically good at what they do.

Depending on the use case both have their pain points. Typically I see a mixture of ORM for most CRUD based stuff and raw SQL for cases where the ORM isn't suitable.

No need to throw the baby out with the bathwater.

crimsonalucard · on Sept 4, 2019

In my example code. Do you notice the string "auto"?

It means you can literally have the API do exactly what SQL does. Automatically specify the query plan. But you explicitly do it.

Meaning that sometimes the auto planner screws up. When it screws up, in those cases you explicitly make it use another algorithm rather then modify SQL to be slightly different and hope that it compiles into something more reasonable.

The baby is rosemary's baby.

james_s_tayler · on Sept 4, 2019

I'm not convinced on the UX of that. Developers are lazy. I think what would more than likely up happening is they explicitly set it to "auto" and modify the query to get a more performant plan. After a certain query complexity It would become exceedingly hard to start to be able to piece together plans by hand.

A bunch of performance problems aren't even solved by tweaking the query either. They're solved by changing the database structure. Adding the appropriate indexes and such. Making sure datatypes match between joined data and no implicit conversions are happening. Right-sizing columns.

No amount of specifying your own query plan will do anything to affect those kinds of issues.

In practice I find mostly it comes down to cardinality estimate issues as a very common source of problems as the database either over or under provisions enough memory for the query. If it estimates it's going to get back a lot more data than it actually does and it grants too much memory that will reduce parallelism because that memory can't be used by other queries. If it under-estimates it doesn't grant enough memory and when it gets back more rows that will fit into memory it has to write them temporarily to disk taking a massive hit in I/O performance.

How does your scheme work with figuring out how much memory the database should grant to a query when specifying a plan by hand?

What's more is plans change over time as statistics change. Leaving it up to the query optimizer means it's adaptive. Having to specify yourself means you have to know the optimizer isn't giving you the best plan at design time. There are some cases where you know that. There are some cases where you don't.

You can already specify query hints etc. I think SQL just has this covered already. I have no qualms if the query changes slightly.

taffer · on Sept 4, 2019

It seems that what you want is not an imperative API, but a more explicit one. I'm afraid the problem is that specifying the access methods or optimizations to use in the database is very product- or even version-specific, on the other hand you can already tell a database what to do using optimizer hints.

crimsonalucard · on Sept 4, 2019

I want both. An imperative language makes it more able to be isomorphic to a web app api. You can have corresponding types and functions without something totally different.

Optimizer hints is a good idea. As far as I know this isn't a well known feature (if it even is a feature) for postgresql, which is the DB I have the most experience with.