Hacker News new | past | comments | ask | show | jobs | submit login

(nothing to do with DuckDB but..) SQL is complex enough, and allowing this (and acyclically as mentioned below) would do my $%^& nut implementing it.

But I know a user requirement when I hear one, so can you give me an large, real example of where allowing this would make things easier? That would be mega helpful, ta




SQL is complex enough

No, it is not. I mean it is, but not in parts where that could be seen as useful and/or convenient. [A]cyclic graph traversal/etc is one of the basic tests in a modern interview at any CRUD studio. How come it could do $%^& to any part of yours?


> How come it could do $%^& to any part of yours?

Because just implementing the standard stuff nearly did my $^&% nut. Also I know about graphs & posets, and it's potentially a little more complex than it seems. The variables

    select x * x as y, 1 as x
is meh, but what about

    select 
        (select tbl.z from tbl where tbl.y = y) as subq, 
        x * yy as y,
        xx + 1 as x,
        subq + yy as zzz
    from ( 
        select xx, yy
        from ... )
I just don't fancy supporting that.


Note you can already reference select list items in GROUP, HAVING, and ORDER BY so it's not that big of an extension.

I've implemented the ability to reference select-list aliases before; it's not that hard to do if implemented basically like a macro expansion. The main problem is user confusion due to ambiguous references, e.g.

    select 2 as x, x as `which x?`
    from (select 1 as x) t;
we ended up adding a warning for the case where a select list alias shadowed a table column, suggesting using a fully-qualified table name if they actually wanted the table column (t.x in the above example).

IMO only allowing references to previous select list items is a perfectly reasonable restriction; loosening it isn't worth the implementation headache or user confusion. Though we did allow using aliases in the WHERE clause.


> Note you can already reference select list items in GROUP, HAVING, and ORDER BY so it's not that big of an extension.

You're just looking for symbols in the symbol table, I think it's a big difference!

> IMO only allowing references to previous select list items is a perfectly reasonable...

agreed, see my other post where I say the same.

> Though we did allow using aliases in the WHERE clause

And the SQL standards people didn't go for this, and I'm sure they were very far from stupid. And nobody's asking why they didn't allow this, which really bothers me.


Oh, was your objection specifically to allowing references to following (not just preceding) select list items? Then we're in violent agreement. That would be complicated to implement and confuse users. Definitely not worth it.


I'm against doing anything without checking beforehand whether it's actually going to be worth the effort.

But yes, I'd be far happier doing left-to-right dependencies only, which I can believe (though I still need evidence) it would be of some value.


> suggesting using a fully-qualified table name if they actually wanted the table column (t.x in the above example).

I just realised why this was bothering me. That means 't' and 't.x' are actually different variables. In standard SQL it's always the case (right?) that an unqualified variable ('t') is just an convenient shorthand for the fully qualified variable ('t.x', or more fully I suppose, '<db>.<schema>.t.x), and you just broke that.


what about

That’s no different than the first snippet, if you aren’t parsing it with regexps, of course. The resulting AST identifiers would simply refer to not only column names, but also to other defined expressions. This is the case for both snippets. It’s either cyclic or not, and when not, it is easy to substitute/cse/etc as usual. The complexity of these expressions is irrelevant.



@wruza, @wenc: These are both very good answers, and you are of course both right. Check the symbol table, anything you can't find should be defined in the same context (in the select list, as a new expr). In which case, match each symbol use (eg. x in x * x as y) to the definition (eg. 1 as x) to establish a set of dependencies, then do a partial order sort, then spit out the results.

I can do that I just don't fancy it, and more to the point nobody is giving me an example of where it would be particularly helpful. So if anyone can, I'm interested.

(also, consider human factors; although an acyclic definition could be extracted from an unordered expression set, a consistent left to right (in the western world anyway, matching textual layout) with dependencies being introduced on the right and depending only on what came before on the left might actually be better for us meatsacks)


My examples are from boring enterprise, not from what we love to create at home. I’ve read and patched literally meters long queries in analytics, which could be reduced dramatically by being self-referential and by other approaches discussed itt. Of course these could be refactored into something “create view/temp/cte”, but that requires a full control of ddl, special access rights and code ownership. Most space was used by similar case-when-then constructs and permutations of values these produced. The original code was on official support, so we couldn’t just rewrite it, because migrating to the next update would cost a week instead of an hour.

I could reach to and post a lenghty example, but it’s nothing but boring reshuffles really, spiced with 3-level joins of “modelling db in db to allow user columns”.

I agree on the LTR idea, because reading a symbol not yet defined may lead to confusion.


It’s not trivial but as someone who has implemented something similar (for an equation based modeling language) it’s not super complicated if you use the right abstractions. It’s basically traversing the AST and doing substitutions.



The thing that makes SQL simple for me is that I can think in set operations devoid of proceduralness. Once we make things more and more sequential the more it is like programming than a formula.


What does "do my $%^& nut" even mean? (looks like Perl ;))


:-) English idiom. Nut = head. Doing my head in, basically.


for example

    select id, count(...something complicated) as complicated_count
    from ....
    order by complicated_count
would help


'ORDER BY 2' would work here, but using the named column is a lot nicer.


Wow, TIL. Great tip for those random one-off queries you have to bash out when investigating a problem.


Please never let this vile shortcut work its way into your production code.


I've seen quite a few production queries that use indexes in GROUP BY and ORDER BY; it's quite common. Probably partially because linters/code review/etc are lightweight to nonexistent amongst the analysts/data science types that I tend to work with.


Indexes are used all over for grouping an ordering, I was objecting only to the syntax of ORDER BY <number>


SQL already supports "order by complicated_count". Did you mean group by?

This isn't really the large, convincing example I was looking for btw.


Many dialects already support using aliases in GROUP BY and HAVING too, btw.

IMO it's most useful (though somewhat more difficult to implement) to be able to use the aliases with window functions or large case/when statements, something like

   SELECT
     page,
     SUM(clicks) AS total_clicks,
     100. * total_clicks / (SUM(total_clicks) OVER ()) AS click_pct,
     100. * SUM(total_clicks) OVER (ORDER BY total_clicks DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / (SUM(total_clicks) OVER ()) AS cumulative_click_pct
    FROM weblog
    GROUP BY page;


Interesting, ta. My code rarely looks like that so thanks for the insight. Was exactrly what I was looking for.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: