My dislike of ORMs mainly stems from the tendency to treat modern SQL engines as glorified dumb bit buckets.
Where a CTE or LATERAL join or RETURNING clause would simplify processing immensely or (better yet) remove the possibility of inconsistent data making its way into the data set, ORMs are largely limited to simplistic mappings between basic tables and views to predefined object definitions. Even worse when the ORM is creating the tables.
SQL is at its heart a transformation language as well as a data extraction tool. ORMs largely ignore these facets to the point where most developers don't even realize anything exists in SQL beyond the basic INSERT/SELECT/UPDATE/DELETE.
It's like owning a full working tool shed but hiring someone to hand you just the one hammer, screwdriver, and hacksaw and convincing you it's enough. Folks go their whole careers without knowing they had a full size table saw, router, sander, and array of wedges just a few meters away.
Most ORM frameworks have the ability to execute raw SQL, so using an ORM does not preclude you from using these features. ORMs tend to put you in the mindset of using simple CRUD statements most of the time - but I think that’s probably for the best, and these more advanced features should be used sparingly.
Not only does each ORM have a different API and access pattern to do this, once you hit this point, you're managing the object model ALONG WITH custom SQL for migrations.
There is also non-trivial danger in letting your ORM decide what data types your database schema should use. Got UUIDs? The ORM will store them as strings. Need start and stop timestamps? A range type with an exclusion constraint may be the right tool.
What you appear to consider "advanced features" are what some others of us consider "perfectly normal." Let's be honest: most devs consider knowing the difference between a LEFT JOIN and an INNER JOIN to be an advanced topic.
Devs will wax poetic about the marginal benefits of monads in limited scenarios, but throw up their hands in defeat when you mention a window function for a dashboard. They'd rather calculate it in the app layer with all the added latency that implies.
> Not only does each ORM have a different API and access pattern to do this
I think it is to be expected that different ORM frameworks would have differing APIs and access patterns. Similar to how different RDBMS's often have differences in the syntaxes and features that they support. It's not a big deal to look up the correct syntax in my opinion.
> you're managing the object model
Which means in the worst case that for some portion of your app you're basically back to where you were if you weren't using an ORM. Although I've found that JPA, for example, plays pretty nicely with native SQL queries (caching gets more difficult, but I think that's to be expected when moving logic out of the app and into the database regardless of whether you are using an ORM or not).
> ALONG WITH custom SQL for migrations
Personally I've always used custom SQL for all migrations. While ORMs often come with a tool to automatically generate a schema, I've never worked on a project that actually used this tool in production.
> There is also non-trivial danger in letting your ORM decide what data types your database schema should use. Got UUIDs? The ORM will store them as strings.
Some data types can be a bit tricky, but any competent ORM should at the very least have hooks to override the type on a field or implement your own custom handlers. [1]
> What you appear to consider "advanced features" are what some others of us consider "perfectly normal." Let's be honest: most devs consider knowing the difference between a LEFT JOIN and an INNER JOIN to be an advanced topic.
I said "more advanced features" - as in more advanced than basic SELECT/INSERT/UPDATE statements. I don't think any feature that you mentioned is particularly challenging to understand. And literally every entry-level developer I've ever hired has easily been able to explain the difference between LEFT and INNER joins in an interview.
I'm lucky enough to work on projects usually lacking any kind of database anywhere. But when I do, I also tend to make sure to use a tiny subset of DB features. Simply because I consider "code" to be the part of the world I have any control over and "database" to be the dangerous and out-of-reach part where errors show up later (Such as in slower/larger integration tests). Any logic that is happening in a DB happens outside of low level testing reach too and that scares me. I'll probably never even use a select-from-select query... I'll happily leave most of the tools in the shed.
> Simply because I consider "code" to be the part of the world I have any control over and "database" to be the dangerous and out-of-reach part where errors show up later
Sounds more like a personal phobia than a substantive critique of databases, especially ACID databases. You can test any query outside of a huge integration test by… running the query in a smaller, more targeted test suite. This can be done for performance checks, conformance, sanity, etc. The thing is, compared to most other programming languages, SQL is largely side effect free. As long as no one is dropping tables or indexes, the query you ran today will return the same result structure you run tomorrow. Obviously changes in the volume of data will affect performance, but that's true no matter what data store you use.
"But what about testing writes," you ask? Same deal. Start a transaction, run the INSERT/UPDATE/DELETE and then ROLLBACK. You get your performance timing, your ability to detect errors, and no permanent changes to your dataset.
Data storage and persistence is hard. It doesn't become easier using NoSQL or avoiding data altogether. But if data makes you personally uncomfortable, perhaps it's best you keep your distance. It greatly limits your career choices, but that's ultimately your choice.
There are tons of antipatterns out there. Often the production database is completely different from anything I can or want to set up on my local machine (E.g. it's a cluster of database type A, while my local dev env is a single node of type B.
> But if data makes you personally uncomfortable
I wouldn't say uncomfortable, simply bored. Not because I think database are bad tech, they are awesome. But we still haven't solved the impedance mismatch between programs and data, so any time you need to work with them, you spend 10% of your time on business logic and 90% on deployments, migrations, ORM and CRUD. Which is just something I'm lucky enough to be able to avoid. I don't like NoSQL either (just like I don't like dynamic typing). I don't think anyone can "avoid data" in programming, but my way of staying sane has been to avoid the web.
If you have fewer than ten thousand simultaneous users, this is not you. After ten million, you're not using an ORM in front of a bare relational database either.
Scale at high numbers adheres to no rules or off-the-shelf tools.
I wish. I'm constantly having to reinvent materialized views on engines that don't have good support for the concept (ms SQL, sqlite) because my users want to sort/filter on a calculated column, meaning that I have to revert to "dumb bucket of bits" for records in seven figures and user-counts in the dozens.
Relational algebra is good. I like the idea of nornalized data.
I hope to be able to use it one day instead of SQL.
So… use one of the engines with better support for materialized views?
This sentiment is like saying, "Java doesn't support traits, so I hate all programming languages."
If an engine doesn't support a feature you need, use a different engine. If your workplace does not allow this, that's an issue you have with your workplace, not the tools.
I've worked on a few projects in the past that "didn't need all that fancy database stuff". It's frustrating to deal with the consequences. You spend most of your time building elaborate workarounds to re-invent basic data integrity safeguards. Roll your own database might be a fun learning experience, but not for production apps.
SQL may be a language for all those things, but unfortunately it's horribly bad at it. Wilfully obtuse syntax. Incredibly poor compositionality. No testability worth the name. The deployment model is a half-baked pile of perl scripts, and that's if you're lucky.
So yeah, I use SQL like https://xkcd.com/783/ . I'm sure you have a bunch of cool data analysis tools in there, but please just shut up, give me the contents of my table, and let me process it in a real programming language where I have map/reduce/filter as regular composable functions that I can test and reuse, with a syntax that doesn't make my eyes bleed.
Give you the contents of your table, so you can process it in the app tier with map/filter/reduce?!
Ah, the hubris of devs who honestly believe they can whip out a solution in a day that beats a dedicated army of developers singularly focused on the task of large data management, storage, and serialization. And almost always forgetting that serialization off of disk and over a network isn't free.
There's a reason why SQL is going on 50 years when most technologies are lucky to remain dominant past 10 in this industry. And if you think it's just because of inertia or lack of imagination, you're deluded. SQL isn't perfect (nothing is), but as a DSL for set theory, it does a damn good job, even 50 years later. Far better than any map/filter/reduce whipped up yet again by someone who doesn't fully understand the scope of the problems being solved.
It's doubly troubling when you can't grok that SELECT = map, WHERE = filter, and GROUP BY + aggregator = reduce. I sincerely hope you aren't avoiding JOIN by loading both tables ahead of time.
> Ah, the hubris of devs who honestly believe they can whip out a solution in a day that beats a dedicated army of developers singularly focused on the task of large data management, storage, and serialization.
A "dedicated army of developers" who've been "going on 50 years" but whose flagship solutions are still single-point-of-failure, still noncompositional, still untestable, still have bizarre and incomprehensible performance characteristics. Yeah, no, I'm going to do stuff in regular application code, thanks.
You pay a huge cost in moving those bits over the network though. How do you even deal with tables that don't fit in memory or the lack of indexes for tables where sequential scanning is too slow?
I mean at that point you're getting into real big data stuff. Move the code to the data rather than moving the data to the code; have some way to stream in the data in its native format rather than having to read it in; do pre-aggregation and indexing as the data is written (but not in a way that's blocking your OLTP). Which, yes, is stuff that SQL RDBMSes do for you up to a point, but they do it in invisible and unmanageable ways; IME you're better off doing it explicitly and visibly.
In seriousness SQL databases can be good for ad-hoc exploratory queries, so having your data processing pipeline spit out a read-only SQL database dump on a regular schedule is worthwhile, but using that for anything more than prototyping is a mistake.
I am not particularly opinionated on the matter and i think i can appreciate both sides of the argument.
SQL is somewhat analogous to C. It is entirely legitimate to look at its archaic usability features and wonder if we could not do better and at the same time it has hit a sweet spot of adaptation to the domain that ensured its longevity.
Something that might shake up things a bit is graph databases / query languages. We can think of them as a generalization of the sql universe and an opportunity to modernize it
Yeah, if your devs are loading entire tables into code and using filter on them ... I haven't personally seen that, but oh god.
Not that it isn't valid in some cases. It is. If for example, you're using all of the table in your logic and need to filter on to do something like pull out certain items for a certain need. Sure. Then it is good, absolutely!
SQL is a declarative language. If you ask it for an address, it'll give you an address. Not sure what you're trying to say with that xkcd.
SQL is a real language. It is testable and reusable. It's a DSL, so the syntax isn't something you have to look at most of the time unless you're a DBA. That said, not everyone hates the syntax.
You're probably losing efficiency in your quest for functional paradigm perfection.
We use integration tests to test our SQL queries. Basically fire up our persistence layer and a database instance together, call some create/update methods, and assert that the expected results are returned. Writing tests is pretty straightforward - the tests proved incredibly valuable recently when we migrated to a different RDBMS provider.
But yeah I guess we don't really have a good way to write automated tests for migrations that run outside of our application code.
DDL is like code. DML is not. Tools like Sqitch are quite adept at testing DDL.
DML is declarative. Like HTML. SQL is a DSL for set theory. Do you test your HTML like the rest of your code too?
But to be clear, Sqitch and pg_tap (off the top of my head) are both effective testing tools for Postgres databases. MS SQL and Oracle both have extensive testing frameworks available.
Sure. HTML doesn't impact as much as the app as SQL does. Missing records in a SQL query can have a huge impact on many systems. A missing table row (HTML) doesn't usually impact that much.
> SQL may be a language for all those things, but unfortunately it's horribly bad at it.
Any piece of software used incorrectly will seem "horribly bad".
> Wilfully obtuse syntax
This is an opinion.
> Incredibly poor compositionality.
Another opinion.
> No testability worth the name
I would argue that "try it, observe side effects and rollback" as a language feature is more testable than many/most programming languages.
> The deployment model is a half-baked pile of perl scripts, and that's if you're lucky.
>> Any piece of software used incorrectly will seem "horribly bad".
I appreciate that you hate it. I don't even think you should care or learn it if you don't want to. But I think positioning things you don't understand or _want to use_ as "horrible" or "bad" or whatever is acting in bad faith. You're allowed to say that you don't like something and don't want to learn how to use it, without suggesting it's the tool's fault.
> So yeah, I use SQL like https://xkcd.com/783/ . I'm sure you have a bunch of cool data analysis tools in there, but please just shut up, give me the contents of my table, and let me process it in a real programming language where I have map/reduce/filter as regular composable functions that I can test and reuse, with a syntax that doesn't make my eyes bleed.
This comment directly demonstrates "I don't understand, I don't want to, just let me do whatever I want however I want", which is fine, but again, not the tool's fault you don't want to use it.
Quoting again because it's also relevant to this last comment:
>> Any piece of software used incorrectly will seem "horribly bad".
At some point there is an objective underlying reality. I've worked at a lot of places and never seen SQL done well; even the people who were happy with SQL had no automated testing, no compositionality to speak of, and a crappy deployment model (and would usually acknowledge all this! They just somehow didn't mind). If one person uses a tool badly that's maybe a problem for that person, but if everyone uses the tool badly that's a problem with the tool.
“Everyone” in this context means “the places one individual person is aware of”, and when compared to _actual_ “everyone”, is mostly irrelevant though. The things you’re complaining about are solved problems, you just don’t care/refuse to acknowledge those solutions, it seems. Which is fine! Like I said, you can just say you don’t like something without trying to smear it.
I do care, but the "solutions" are always vaporware. It's like when I complain about C++ memory unsafety and someone says "it's fine, under the new standard you can just use a safe subset of C++" and I say "ok, where's the definition of this subset and how can I tell whether a library follows it or not?" and they say "uhhh....". At some point you stop asking.
This is the second time you’ve said “it’s bad!” And then used a strange hand wavy analogy (in the first post, an xkcd comic) as an out. Again, if you don’t understand something and don’t want to, just say it, don’t paint it as bad.
Enjoy your vastly inefficient and expensive map/reduce configurations if that’s what you like.
You want to be concrete? OK, literally every real-world SQL setup I've seen, when not piggybacking off the application-level infrastructure:
- Has no way to deploy a specific historical version of the SQL (e.g. deploy this git tag)
- Has no automated checking that the SQL behaves as expected (e.g. this query with this test data should produce this output; if it doesn't, the git tag will not be created). Even checking that the SQL is syntactically valid is rare.
- Has no reuse of expressions smaller than a view/table
- Has not even basic type checking, e.g. something that alerts if you are combining an expression that might produce null with an expression that does not handle null well
- Has not even basic library/dependency management
- Has no practical structured values (since none of the infrastructure that would make using ad-hoc temporary tables safe exists). CTEs are an improvement but still thoroughly noncompositional since you can only put them in one place.
Let me guess, "these are solved problems" but no details of what you do to solve them, because none of the solutions actually work.
> - Has no way to deploy a specific historical version of the SQL (e.g. deploy this git tag)
Can you clarify this? Do you mean the version of a specific schema/query?
> - Has no automated checking that the SQL behaves as expected (e.g. this query with this test data should produce this output; if it doesn't, the git tag will not be created). Even checking that the SQL is syntactically valid is rare.
This is part of the purpose of transactions [0]
> - Has no reuse of expressions smaller than a view/table
Incorrect [1][2][3]
> - Has not even basic type checking, e.g. something that alerts if you are combining an expression that might produce null with an expression that does not handle null well
Incorrect [4] [5]
> - Has not even basic library/dependency management
Incorrect [6]
> - Has no practical structured values (since none of the infrastructure that would make using ad-hoc temporary tables safe exists). CTEs are an improvement but still thoroughly noncompositional since you can only put them in one place.
Can you clarify? How is a transaction + CTE/subquery not sufficient?
> Let me guess, "these are solved problems" but no details of what you do to solve them, because none of the solutions actually work.
Replace "none of the solutions actually work" with "I don't know how to use them", and yes you're correct.
I'm happy to answer more of your questions in good faith, but I'm not really interested in debating a person with their head in the sand. Please let me know if I can help you, if you're actually interested. But also, as I've said, it's also fine to not want/not like this type of stuff. Different strokes and all that.
> Can you clarify this? Do you mean the version of a specific schema/query?
I mean being able to version a system implemented in "advanced SQL" the same way I'd version one implemented in an applications language. Make some changes, keep them in VCS, at some point decide to do a release and deploy. A few days later, discover some issue with the changed logic, roll back to the previous release of my "code" (without affecting the data, so not just restoring a database backup).
> This is part of the purpose of transactions [0]
Right, but the actual workflow tooling around how to use them for this is missing, and for whatever reason the culture that would build and standardise it seems to be missing too. Like, in most ecosystems there's a standardised test-edit cycle that everyone understands; you make your changes and then you run npm test or cargo test or whatever, and you get some assurance that your changes were correct, and that same tooling is also in control of your release workflow and will prevent or at least warn you if you try to do a release where your tests are failing.
> Incorrect [1][2][3]
Postgresql functions with composite values (and more generally the fact that composite values exist at all) sound like exactly what I was looking for, so that would be a big improvement if I could use them (although for the record they're not a standard SQL feature; MySQL functions can only return a scala value, so there's a major missing middle between functions and views). But even then they have the same problem as temporary tables/views of existing "globally", in the place you'd expect data rather than code to be, and deployment/use tooling being inadequate to use them safely (or at least widely perceived as such). Like, everywhere I've worked has had a de facto rule of "no DDL on the production database except for deliberate long-term changes to the schema that have gone through a review process", and I don't think that's unreasonable (maybe you do?).
> Incorrect [4] [5]
Those functions exist, I'm talking about having tooling that can tell you where you need to use them. Even in something like Python you have linters that will catch basic mistakes, and they're integrated into the workflow tooling so that you won't accidentally release without running them.
> Incorrect [6]
A list of libraries != library/dependency management.
> Can you clarify? How is a transaction + CTE/subquery not sufficient?
If I want to pull out an expression that appears in two arbitrary points in my query, and use it as a CTE, that requires a lot more thought than it would in most languages, because it goes differently depending on whether it was in the SELECT or the WHERE or the GROUP BY or.... The grammar is just somehow less consistent than most programming languages, and the scoping is more confusing, I think because it's kind of lexically backwards (like, you have to declare things to be able to use them, but a lot of the time the declaration goes after the usage. But not always!).
> I mean being able to version a system implemented in "advanced SQL" the same way I'd version one implemented in an applications language. Make some changes, keep them in VCS, at some point decide to do a release and deploy.
It's tough to follow these posts in part because SQL is a programming language used to interract with a database. It seems like you want to treat the SQL you write synonymously with the system you're writing it against, which is...I don't know...confusing? SQL is declarative, you say make it so and the underlying engine makes it so. I've had trouble parsing whether you dislike _writing SQL to interract with a database_ or _the way RDMS systems manage data_. The only "logic" that exists in most databases is surrounding constraints, which, if there's a bug in your constraint, you tell your database to update its schema, and it does so. You're free to store your SQL queries however you'd like, just like any other programming language.
> A few days later, discover some issue with the changed logic, roll back to the previous release of my "code" (without affecting the data, so not just restoring a database backup).
As is the nature of this conversation, this is just inherently not how these systems are intended to work. Asking for "new data but old shape" is legitimately ridiculous.
> Right, but the actual workflow tooling around how to use them for this is missing, and for whatever reason the culture that would build and standardise it seems to be missing too.
I guess I'm confused as to what "workflow tooling" you're looking for? Every RDMS supports multiple databases and schemas, which enables you to create and run test configurations with test data in real-world environments. A transaction enables you to test real queries on real data without risk.
> Like, in most ecosystems there's a standardised test-edit cycle that everyone understands; you make your changes and then you run npm test or cargo test or whatever, and you get some assurance that your changes were correct, and that same tooling is also in control of your release workflow and will prevent or at least warn you if you try to do a release where your tests are failing.
Sure, and in database systems this is the same. Craft a query out of a transaction, when it does what you want, commit it (either via a transaction or to normal source control). Everything you're asking for literally already exists, exactly how you're asking for it.
> Those functions exist, I'm talking about having tooling that can tell you where you need to use them. Even in something like Python you have linters that will catch basic mistakes, and they're integrated into the workflow tooling so that you won't accidentally release without running them.
The more I read what you write, the more it seems like you think you're unable to write SQL in a file and execute it against a database? Which...you can do...it happens all the time. I didn't bring it up because it's so obvious and common I thought there was some other misunderstanding.
> A list of libraries != library/dependency management.
I'm not sure what else you want? Your RDMS of choice maintains a list similar to a package.json containing dependencies and their versions, and you can interract with either that list directly, or with RDMS-specific commands similar to "npm install react". Again, what you're asking for literally exists exactly how you're asking for it.
> If I want to pull out an expression that appears in two arbitrary points in my query, and use it as a CTE, that requires a lot more thought than it would in most languages, because it goes differently depending on whether it was in the SELECT or the WHERE or the GROUP BY or....
If you open a java file, are you allowed to write arbitrary code wherever you want? No, that would be ridiculous. Enforcing some shape or structure on source is common and expected.
> The grammar is just somehow less consistent than most programming languages, and the scoping is more confusing, I think because it's kind of lexically backwards (like, you have to declare things to be able to use them, but a lot of the time the declaration goes after the usage. But not always!).
I'm happy to meet you in the middle (really, one millimeter from where I currently stand) and acknowledge that it can be at times frustrating to modify a SELECT portion of a query prior to getting to the FROM clause, which reduces the ability for editors to assist in typeahead (because as you've said, you haven't gotten to the FROM yet).
> The only "logic" that exists in most databases is surrounding constraints, which, if there's a bug in your constraint, you tell your database to update its schema, and it does so.
Well, complex operations on data - even if those operations are just selection and aggregation - require logic. So either that logic lives in SQL, or it lives in application code.
> I guess I'm confused as to what "workflow tooling" you're looking for? Every RDMS supports multiple databases and schemas, which enables you to create and run test configurations with test data in real-world environments. A transaction enables you to test real queries on real data without risk.
So where is the equivalent of npm/cargo/maven? And where is the unit testing framework? Transactions are low-level functionality that you could build this tooling on top of - but as far as I can see no-one has, or at least not to the extent that it's standardized and accepted in the community. Where in RDBMS-land can I check out an existing project, make a small edit to one of the queries, and then run that project's tests to confirm I haven't broken it? A few projects have some support tools for doing this, but they're inevitably ad-hoc and unpolished.
> The more I read what you write, the more it seems like you think you're unable to write SQL in a file and execute it against a database? Which...you can do...it happens all the time.
I want a project with a bit more structure than a single file. And I want to share and reuse pieces between multiple projects rather than writing everything from scratch every time. Again, that's the low-level functionality, but where is the workflow tooling that actually builds on that to let you do day-to-day things in a standardised way?
> I'm not sure what else you want? Your RDMS of choice maintains a list similar to a package.json containing dependencies and their versions, and you can interract with either that list directly, or with RDMS-specific commands similar to "npm install react".
What? Where? You linked to a list for postgresql that literally has 12 packages available, total (I'm pretty sure I've published more packages than that in Maven central myself).
> If you open a java file, are you allowed to write arbitrary code wherever you want?
Not quite, but I can select any subexpression of an expression almost anywhere and pull it out into either a local variable or a function - usually by doing nothing more than hitting a key combo in my IDE. I haven't found anything like that for SQL.
To be fair, a lot of those tools have that one weird caveat you need to know, and are also strongly dependent on your dialect.
For example there's an aggregate function in DB/2, LISTAGG, that joins strings... the caveat being that if they get too long, the query blows up. It's in OracleSQL too, which has a syntax where you can tell it to truncate the string.
SELECT, INSERT, UPDATE, DELETE work more or less the same whether you're on MariaDB, Postgres, DB/2, OracleSQL, etc.
As opposed to the dizzying array of ORMs that all have multiple caveats you need to know and utterly incompatible in fundamental ways between each other?
Folks go on about how they're "stuck" with a single dialect of SQL but completely ignore how utterly impossible it is to switch ORMs without a full app rewrite. And ORMs are all language-specific, so if you have two different app clients in different languages, you can't always share even basic API access patterns.
Got a Java Spring app alongside a Django app? Good luck!
Where a CTE or LATERAL join or RETURNING clause would simplify processing immensely or (better yet) remove the possibility of inconsistent data making its way into the data set, ORMs are largely limited to simplistic mappings between basic tables and views to predefined object definitions. Even worse when the ORM is creating the tables.
SQL is at its heart a transformation language as well as a data extraction tool. ORMs largely ignore these facets to the point where most developers don't even realize anything exists in SQL beyond the basic INSERT/SELECT/UPDATE/DELETE.
Pivot tables. Temporal queries. CUBE/ROLLUP. Window functions. Set-returning functions. Materialized views. Foreign tables. JSON processing. Date processing. Exclusion constraints. Types like ranges, intervals, domains. Row-level security. MERGE.
It's like owning a full working tool shed but hiring someone to hand you just the one hammer, screwdriver, and hacksaw and convincing you it's enough. Folks go their whole careers without knowing they had a full size table saw, router, sander, and array of wedges just a few meters away.