Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We had a case where a single OR was a massive performance problem on MSSQL, but not at all on Sybase SQLAnywhere we're migrating away from. Which one might consider slightly ironic given the origins of MSSQL...

Anyway, the solution was to manually rewrite the query as a UNION ALL of the two cases which was fast on both. I'm still annoyed though by the fact that MSSQL couldn't just have done that for me.



I am very grateful for databases but I have so many stories of having to manhandle them into doing what seems like it should be obvious to a reasonable query optimizer. Writing one must be very hard.


If an optimiser was as smart as a human it would take potentially minutes to come up with a (reasonably good) SQL execution plan for any non-trivial query :)


> it would take potentially minutes

This is a key part of the problem, and something that people don't realise about query planners. The goal of the planner is not to find the best query plan no matter what, or even to find the best plan at all, it is instead to try to find a good enough plan quickly.

The QPs two constraints (find something good enough, do so very quickly) are often diametrically opposed. It must be an interesting bit of code to work on, especially as new features are slowly added to the query language.


Yeah, exactly. You need to optimise for the overall query duration, including the optimiser itself, and obviously everyone's workload is different, so the right balance may as well not exist at all


This highlights that there is space for an offline automated query omptimizer/rewriter, to compliment the online query planning?


MS are tinkering with things a little in that direction with recent versions of SQL Server (usually in AzureSQL first then pushed to on-prem releases), though this is mostly around avoiding over-caching. A significant problem with query planners is that to make them better you make them slower, so you introduce caching, but that means you might end up using a bad plan when inputs change.


One thing to be careful of with the UNION ALL method, is that if you have some rows that match more than one of the clauses in your set of ORs then you will have duplicate results to screen out. This won't happen if you are checking for multiple values in one field, obviously, but is something to be wary of when using this method to optimise kitchen sink queries more generally.

Slapping a DISTINCT in isn't the answer, if it was then you'd just use UNION instead of UNION ALL, because that often makes the query planner call for a lot of excess work to be done. I once found that wrapping and main kitchen sink in a CTE and applying DISTINCT in the SELECT that calls it has the desired effect, but that is risky as it relies on undefined behaviour that may change at a later date and tank your performance. If the number of rows being returned is never going to be large, and your situation allows multiple statements, a safer option is to pull the data into a temporary table and SELECT from that with DISTINCT. Or you could de-dupe in the next layer instead of the DB, or just accept dupes if exact cardinality isn't important and your users aren't going to care about the occasional double-row (i.e. a simple quick-search feature).

And, of course, sometimes you want the duplicates, again maybe in the case of a quick search feature (where you split the results into categories and each duplicate of a row is likely in a different category, especially if the categories align with the search clauses that are being ORed).


Good point. In the described case the OR terms were guaranteed to be disjoint, in a way the query planner could easily figure out. Which Sybase's planner did.

However for cases like described in the article, you'd need to handle that.

While I like CTEs, I've had more consistent luck with subqueries. They also compose more easily.


SQL Anywhere doesn't really have "Sybase" roots.

It started out as Watcom SQL, then after Watcom was acquired by PowerSoft, it was renamed to "SQL Anywhere".

After Sybase acquired PowerSoft it was later renamed to "Adaptive Server Anywhere". I think SAP renamed it back to "SQL Anywhere" after they acquired Sybase.


Ah that makes sense, thanks for the correction. I first heard it from my colleague, and probably mixed it up due to the anywhere name when checking out Wikipedia later.


Postgres sometimes handles this for you, but I'm not sure exactly when it's able to do that, so I do UNION ALL.


disjunctions and stats could be a pretty nasty combination in mssql. i think it got a bit better ca. 2016 with the CE updates, but i’ve had quite a few occurrences where the solutions were the union all approach




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: