> In any sufficiently complex application, something like a query builder that allows different parts of the application to work together to lazily compose queries
I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.
Whether it's an ORM or a home-built query compositor or whatever, one thing I know from experience is that once your application is "sufficiently complex" that you start to (incorrectly) believe you need this, your application has become too complex to use it reliably.
You absolutely will be mistakenly evaluating these builders in the wrong layers, iterating results without realising you're generating N+1 queries, etc.
> I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.
Whichever layer you put it in, you need a way to compose two query fragments. Otherwise you have to write N*M queries manually instead of N+M query fragments.
Either you use an ORM to help you with this or you don't, with all the usual tradeoffs about using a library or not. But you still have to solve the problem. (Or you do a lot of tedious copy-paste work - with all the usual tradeoffs of that)
sure, sometimes, rarely -- these are exceptions, not rules
in general, it should not be possible for user input to produce arbitrarily complex queries against your database
each input element in an HTML form should map to a well-defined parameter of a SQL query builder, like, you shouldn't be dynamically composing sub-queries based on the value of a text field, the value should add a where or join or whatever other clause to the single well-defined query
sometimes this isn't possible but these should be super rare exceptions
I prefer using something like Rails or Django to build 10 fully working CRUD interfaces with well-defined yet dynamic filters in a day instead of spending two weeks needlessly writing the equivalent code by hand.
You’ve never actually implemented a real world implementation, have you?
You’re going to have parameters that are compound. You’re going to end up filtering on objects 3 relations removed, or deal with nasty syncing of normalization. You’ll have endpoints with generic relations, like file uploads, where the parent isnt a foreign key.
It’s going to be a mess. They will NOT always be simple to write.
> it's pretty rare for queries to be dynamically composed from arbitrary sub-queries
I'm talking static, not dynamic. You still need to compose two pieces together into a single query, and you can either use an ORM to help with that or not.
> the interface between the application and the DB is actually a string! it's not an abstract data type, it doesn't benefit from being modeled by types
No it isn't. You can't send an arbitrary string to the database and expect it to work. At the very least you benefit from having an interface that's structured enough to tell you whether your parentheses are balanced and your quotes are matched rather than having to figure that out at runtime.
> when your app queries the db, the query is not composed from several pieces, it is well-defined in the relevant method
> this is a single query, not multiple
And when you want to query for multiple related things together, the whole point of having a relational database? For different purposes you need different views on your data, and those views are generally constructed out of a bunch of shared fragments; you can either figure out a way to share them, or copy-paste them everywhere you use them.
> the db accepts a string and parses it to an AST, it does not accept a typed value
> this means the interface is the string
The DB accepts a structured query, not a string. It might be represented as a string on the wire, but if that was what mattered then we'd use byte arrays for all our variables since everything's a byte array at runtime.
> unbalanced parens and whatever other invalid syntax is obviously caught by tests
i'm not sure what you're thinking about when you say "multiple related things"
every "view" on your DB should be modeled as a separate function
every possible "thing" that's input to a function which queries the database should be transformed into a part of the query string by that function
> The DB accepts a structured query, not a string. It might be represented as a string on the wire, but if that was what mattered then we'd use byte arrays for all our variables since everything's a byte array at runtime.
...no
the API literally receives a string and passes it directly to the DB's query parser
if the DB accepted structured queries, then the API would rely on something like protobuf to parse raw bytes to native types -- it doesn't
like `echo SELECT * FROM whatever; | psql` does not parse the `SELECT * FROM whatever;` string to a structured type, it sends the string directly to the database
> every "view" on your DB should be modeled as a separate function
OK, and when a significant amount of what those functions do is shared, how do you share it? (E.g. imagine we're building, IDK, some kind of CMS, and in one view we have authors (based on some criteria) and posts by those authors, and in another we have tags and posts under those tags, and in another we have date ranges and posts in that date range. How do you share the "fetching posts" bit of SQL between those three different (parameterized) queries?)
> if the DB accepted structured queries, then the API would rely on something like protobuf to parse raw bytes to native types -- it doesn't
> like `echo SELECT * FROM whatever; | psql` does not parse the `SELECT * FROM whatever;` string to a structured type, it sends the string directly to the database
In both cases the parsing happens on the server, not the client. "echo abcde | psql" and "curl -D abcde http://my-protobuf-service/" are both doing the same kind of thing - passing an unstructured string to a server which will fail to parse it and give some kind of error - and both equally useless.
> How do you share the "fetching posts" bit of SQL between those three different (parameterized) queries?)
your application has a fetch posts method, that method takes input including (optional) author(s), tag(s), etc., it builds a query that includes WHERE clauses for every provided parameter
the code that converts an author to a WHERE clause can be a function, the point is it outputs a string, or something that is input to a builder and results in a string
i'm not sure what a "fetching posts bit of SQL" is, a query selects specific rows, qualified by where clauses that filter the result set, joins that modify it, etc.
> the code that converts an author to a WHERE clause can be a function, the point is it outputs a string, or something that is input to a builder and results in a string
So you do string->string processing, and you explicitly won't treat the queries you're generating as structured values? Enjoy your rampant SQL injection vulnerabilities.
creating a query string that's parameterized on input usually means you model those input parameters as `?` or `$1` or whatever, and provide them explicitly as part of the query
Ok so now your WHERE clauses are no longer strings, you have to have some structured representation of them that knows which parameters go with which clauses, and something that understands the structure of your queries enough to line up the parameters when you stitch together different WHERE clauses to make your full query - exactly the kind of thing you were saying was unnecessary.
Say you create a nice little utility function that makes a call to the db func1 and you write func2 that also makes a db query. If you have a func3
that needs func1 and func2 conceptually your options are:
* Accept that this will just be two trips to the DB.
* Write a new function func12 that writes a query to return all the needed data in one query and use that instead.
* Have your tool be able to automatically compose the queries.
If you do with the second option you have to do that with every combination of functions that end up being used together which is multiplicative in general.
CTEs and views are both a bit bigger than the parts that you would normally want to share and reuse, and they're also not very well standardised. Plus no-one really agrees on how dynamic tables should be, so you end up with the Excel maintainability problem of no real separation between code and data.
Following this advice too closely and your application logic starts leaking into your data layer.
One simple example is when you need to make atomic changes to two different types of entities. In the data layer, this is usually trivial — just run two queries within a transaction — but you need to expose a function that boils down to `updateBothAandB`. Rinse and repeat enough times and your data layer is a soup of business logic.
Exactly, in my opinion, separating your business logic from your data access layer completely is futile. You will need to either have business logic in your data access layer or fine grained query controls (when to add a filter clause, when to execute a query, what should go into a transaction, etc) in your business logic for performance reasons. This is OK, it's incidental complexity.
There are patterns to resolve this. Depending how you structure your DAL, your data access classes can provide transaction handles. It's a data access layer, you don't need to abstract away the fact that you're dealing with a database.
I mean, yeah, but the patterns all look like the thing you said you don’t accept in your own applications: “something like a query builder that allows different parts of the application to work together to lazily compose queries”.
An ORM is a specific kind of data layer. It is general-purpose, and as the name suggests, it is an Object-Relational Mapper whose primary purpose is to map relations to objects and vice-versa.
So no, bespoke data access layer code is not an ORM.
Why are a data access layer and an ORM mutually exclusive? An ORM is just an abstraction over your database. You might find use in it as a tool to access your database but contain the use of the ORM to within your data access layer.
Mind sharing an example of a large(ish) app that doesn't make use of an ORM? Last time this topic came up, I went looking for one (admittedly not too hard) and I came up empty handed.
I don't see this as a given and I don't accept it in my own applications. If you need data, go to the data access layer. If it doesn't provide you what you need, build a new repository / provider / whatever your pattern is.
Whether it's an ORM or a home-built query compositor or whatever, one thing I know from experience is that once your application is "sufficiently complex" that you start to (incorrectly) believe you need this, your application has become too complex to use it reliably.
You absolutely will be mistakenly evaluating these builders in the wrong layers, iterating results without realising you're generating N+1 queries, etc.
You don't need an ORM.