I must admit, I don't like the the use of floats/decimals/rationals in this way. They are unfit for sorting.
Say I wanted to present a list of users with their top three to-do items, that's impossible using the above method.
select
u.user, item1=i1.text, item2=i2.text, item3=i3.text
from
users u
join items i1 on i1.user = u.user and i1.pos = 1
left join items i2 on i2.user = u.user and i2.pos = 2
left join items i3 on i3.user = u.user and i3.pos = 3
Them being integers means I can rationally reason about them. I can also infer their position easily. With floats, I don't know where pos=3 is without counting how many have a lower number.
Floats only solve two issues: Making inserting easier while still being able to use `order by`. But reasoning about your data becomes a lot harder.
Two ways to do this with window functions (there's probably a better one but this is what I came up with quickly):
1. Subquery
SELECT * FROM (
SELECT u.user, i.text, row_number() OVER (PARTITION BY i.user ORDER BY i.pos) AS row
FROM users u INNER JOIN items i USING(user)
) numbered
WHERE row <= 3;
2. CTE
WITH numbered AS (
SELECT u.user, i.text, row_number() OVER (PARTITION BY i.user ORDER BY i.pos) AS row
FROM users u INNER JOIN items i USING(user)
)
SELECT * FROM numbered WHERE row <= 3;
Same thing, really. CTE looks a bit clearer, subquery seems to generate a faster plan but I don't really have a large enough dataset handy for it to make a difference.
They are fine for sorting! You just need to think about things differently.
SELECT user, text FROM (
SELECT u.user, text, row_number() OVER (PARTITION BY user ORDER BY rank) as rank
FROM users u
LEFT JOIN items using(user)
) ranked_items
WHERE rank < 4
This option works for unlimited top-n without creating a bunch of unnecessary columns in the response.
You are not wrong, although your example would not work in every SQL dialect (even when rewritten). Sybase's T-SQL does not have equivalent for LIMIT.[0]
My point being, there are some issues with regards to data purity and reasoning, particularly if you want a system where users can build custom reports.
And I just wish the original article highlighted that using floats would have these issues, and if writing SQL like this is important to you, then you might want to reconsider.
(As for your note in brackets, where I work, SQL is basically used as a scripting language. So we would do my example in SQL.)
[0] Yes, I know there is 'set rowcount', but you cannot do that within a sub select.
> "I don't like the the use of floats/decimals/rationals in this way. They are unfit for sorting."
Would you elaborate on why you consider these types unfit for sorting? Their orderings are deterministic, and efficiency depends on the implementation of the indexes for those specific types, not inherent in the data types themselves.
> Would you elaborate on why you consider these types unfit for sorting?
I should have changed that language, what I meant was that they were unfit for reasoning about your data from an isolated point of view, i.e. standing with just one row. To understand an item's position using floats, you must know the entire context.
> It would be more natural in SQL to return three rows than return three columns.
Look at my select again, yours have a specific user in mind, mine doesn't. I want a list of all users who have at least one to-do item. And I want to see their top three to-do items in the report.
Without doing sub-selects in the joins, there is no way to limit this to three when the position is stored as float. And even your sub-selects are going to be complicated.
It might be better to create a temporary table first with each user and their first three items, but that's hardly efficient.
> Why would you prefer to do it the way you describe?
Because I need an overview. And I don't want up to three rows per user, particularly because some users will only get one row, some only two and most three rows. That creates an inconsistent overview.
(And that's assuming you write some SQL to properly limit the joins to three rows.)
Gotcha. Thanks for clarifying. You're right that you're effectively encoding the position, and that's part of the trade off. Edit to add: As for knowing the position by looking at it, I'd argue that most use cases, one wants to know the positions relative to others, which means you want to know them in context, next to the other rows. If you want to know the absolute numbers, you can use rank or row_number to add it: perhaps even provide a view if this is something you do frequently.
As for the report query you describe, you can get at the data you want with windowing functions. There are strategies for performing the pivot as well, but I'd likely do that in application code rather than in the SQL query.
Ah, ok, I misread. Yeah, that requires a subquery then, but not because of floats being unsuitabe for sorting, but because you're doing a 1-to-many thingy in a 1-to-1 way by hardcoding 3 columns. It's like writing
items[0].text + items[1].text + items[2].text
In code to simplify it, instead of
items.slice(0,2).map((e) => e.text).concat('');
The former mostly works, but it's definitely not elegant. And you have to be sure the number of items being 3 is set in stone.
Say I wanted to present a list of users with their top three to-do items, that's impossible using the above method.
Them being integers means I can rationally reason about them. I can also infer their position easily. With floats, I don't know where pos=3 is without counting how many have a lower number.Floats only solve two issues: Making inserting easier while still being able to use `order by`. But reasoning about your data becomes a lot harder.