Would it make you feel better about it to think of it as (or write it as) "selec...

chaps · on Oct 21, 2023

Nope, for the same exact reason. Select * makes the most sense because * is in the context of my data, not something I'm working into it. Pretty sure it's the same speed.

Also mind you, I use a lot of CTEs, so this would look weird in that context -- hence why using row number sometimes makes more sense and achieves the same thing.

hobs · on Oct 21, 2023

Filter out everything, Project only what you need, Transform it as lightly as possible.

In any context I understand a row number would never "make sense" if a constant of 1 would be the same output, it would be a lot more code that does... nothing?

Any code using select * just breaks in the future with any new columns being added, no thanks.

chaps · on Oct 21, 2023

  Any code using select * just breaks in the future with any new columns being added, no thanks.

For you, maybe. In my workflows this is really a non-issue for me.

Maybe consider that we use SQL differently and your goals and challenges are different from mine.

(Edit: What's with the downvotes from people just disagreeing about preferences? So weird.)

rrrrrrrrrrrryan · on Oct 21, 2023

I've spent most of my career working with teams that mostly write SQL code, at multiple companies.

Except for very rare fringe cases, using "SELECT *" in production code is universally considered bad practice.

OoooooooO · on Oct 21, 2023

Select * is pretty standard for wide table queries like they are used in Data Analytics teams regularly. Because you really want ALL columns, even new ones without going back to fix all 300+ dashboards.

chaps · on Oct 21, 2023

Yep. I think the others commenting here aren't using SQL for analytics and don't recognize the importance of select * in that context.

__jem · on Oct 21, 2023

you're getting downvoted because it's not really a preference, it's pretty widely known to be bad practice and unhygienic in production queries. select * would get your pr rejected and chewed out by dba at every place i've ever worked at. so you kinda just look like you don't know what you're talking about.

croes · on Oct 21, 2023

In an exists query select * is harmless, select 1 and select * result in the same execution plan at least in MS SQL.

In a query than returns result rows it could break the query as soon as you add columns with names that already exist in other tables you joined in the query.

__jem · on Oct 21, 2023

yes, it's harmless in this position but it provides no additional benefits to the select 1 idiom and is suggestive of poor query discipline. it's far easier to say just don't ever use select * in queries.

RHSeeger · on Oct 21, 2023

I would add to this a bit in that

1. Given that "select " is considered something to avoid except when necessary in edge cases

2. And "select 1" will accomplish the same goal

Anyone reading the "select " version of the code will have to stop and consider whether it is using "select " for a reason, because "select 1" would be the normal choice. Using "select " is assumed to be conveying some intent (that isn't there) _because_ it's not the expected way to do it.

I kind of see it like

    if (thisField == thatField) ...

vs

    if ( (( true || false )) && ( 11 == 11 ) && thisField == thatField ) ...

Sure, they do the same thing... but you have to stop and look at the second one to make sure you're understanding what it does and if there's some reason its weird.

n4r9 · on Oct 21, 2023

I haven't downvoted anyone, but have followed this argument with interest as an intermediate SQL user.

If I was to guess why someone would downvote you, it wouldn't be for disagreeing with you, but more because you've subtly shifted from quite a strong objective stance ("this is not readable") to a subjective one ("this is not how I prefer to write it"), without really conceding anyone else's points.

chaps · on Oct 21, 2023

1 think my point makes more sense when you consider that I 1. Don't work with production code (more analysis, ad hoc code in an investigatory capacity) and 2. that when 1 mention someone is "new" what 1 mean is someone actively learning and not from a technical background. 1ME, folk like that have a difficult time with that floating 1. So while yes it's a standard that programmers are familiar with, it's not something that someone new will be very comfortable with. Lots of people 1 work with come from a pandas-only background.

Not really conceding because as far as 1 can see, everybody is coming from a position of familiarity.

n4r9 · on Oct 23, 2023

That's totally fair. Perhaps the confusion could have been avoided by qualifying in your initial comment that you're referring to a specific situation i.e. not-too-technical analysts writing ad hoc code.

hobs · on Oct 21, 2023

I didn't downvote you, but consider this - I work with SQL a lot, like a lot a lot. Something that's your code today is probably my code tomorrow.

So when you say "my flow is X" and your flow is inimical to maintaining it and extending it, people might get a bit irritated at the last dev that did the exact same thing.

croes · on Oct 21, 2023

How would Select * break with a new column?

hobs · on Oct 21, 2023

Any situation where a new or elsewise unknown attribute breaks your code, binding is a big one.

* Say you are joining two tables and one now has a conflicting/duplicate name, surprise, you now have broken code, it literally will not execute with ambiguous duplicate references.

* By the same token, downstream views can break for the same/similar reason.

* In some engines views are not going to actually include your columns until they are "refreshed"(SQL Server) so that one day that's out of band of your deployment your views will suddenly change.

* Say you have a report with specific headers - tada, it's now got whatever people add to the table - and sourcing it can be a pain because its unclear where the source is in the query, requiring schema.

* Performance expectations can change if the data type is much larger, up to the point of actually breaking the client or consuming tens of billions of times more resources.

cwbriscoe · on Oct 21, 2023

If you have a table with two columns and you do a 'select *', adding a column to the table can break code that is only expecting 2 columns.

croes · on Oct 21, 2023

Usually code refers to columns by name and additional columns are just ignored, the only case I know is when you add a column to a joined table and the column name already exists in the other table resulting in ambiguous column names.

In an exists clause the * is harmless

cwbriscoe · on Oct 23, 2023

I am talking about 'select ' at the top level, they can be harmless in exists, sub-selects and CTE's. The number of columns sent to the client (the program) will change when you add a column to the database. If you don't remember to change all of the places in your code where you used a 'select ', you program is likely going to fail or have unexpected results.

moritzwarhier · on Oct 21, 2023

Just wanted to write, why not select count(*) ? But I guess that's what you meant by row numbers.

Select 1 communicates that no columns need to be selected, so it forces inexperienced readers such as myself to understand why that is the case.

So imho, it carries more information than selecting some arbitrary columns, or counting the number of rows (for which I only care it's > 0)

mjan22640 · on Oct 21, 2023

The relevant thing for exists is the set of the rows returned by the select being non/empty. The value in the columns is irrelevant. Using * creates an incorrect impression that the value is relevant and the relevance tied to the data.