The syntax follows https://github.com/krisajenkins/yesql/ which works with Clojure. So then the DB queries can be defined in a language agnostic way, so you don't need a gateway API to access the DB :)
There are many opportunities to grow this, e.g. transforming the SQL files into stored procedures, and being able to lint and check against a DB schema definition. However, I'm not sure that the indirection step of hiding the SQL away from the code actually makes sense in the long run. A tight coupling allows filters and other optimisations to be easily added to the end of a query, and saves one lookup that is not yet supported in your editor.
Hey, thanks for the opportunity ideas! You're totally right about the editor lookup problem. I wrote this project, I use this project in a production setting and not being able to jump to the SQL from code does bother me.
Seems similar in inspiration to https://pugsql.org/ which is a python reimplementation of hugSQL, more or less.
I'd say an advantage of pugSQL is that it's sqla-core under the hood, so lots of stuff will Just Work and you can drop down to core to commit war crimes if you have to.
I haven't used pug, but I read through it a bit and like their approach. I think they have a more intricate way of parsing SQL than we have in aiosql. We're doing it quick and dirty with some regular expressions.
[hp]ugSQL seems to have been inspired by[1] Yesql, a library with ``a similar take on using SQL'' that has also inspired this[2] rom-rb adapter; not a Ruby guy so there may be something a lot better/closer, but this is what I came up with after a couple minutes of searching.
My initial thought is I'd rather just have a module that would find project-related .sql files and parse into named blocks based on the comments, treating them as named strings. No db entanglement required, just a simple way to organize queries in sql files. You would use it more like this:
import sqlite3
import project_queries as queries # a module that loads queries from local .sql files as strings in module namespace
conn = sqlite3.connect('myapp.db')
cursor = conn.cursor(queries.get_all_users)
# ^^^^^^^^^^^^^^^^^^^^^
users = cursor.fetchall()
# >>> [(1, "nackjicholson", "William", "Vaughn"), (2, "johndoe", "John", "Doe"), ...]
aiosql may add some features to help with passing values to the query. I see `^` used with `:users` in an example but don't quite get it.
-- name: get-user-by-username^
I often need to generate DDL details like table and column names in the query, which you don't want escaped, along with data like values and ids where you would want escaping (:users or %(users)s).
It works nicely to use string formatting (`CREATE TABLE {table_name} ...`) for DDL along with interpolation (`WHERE user_id IN %(user_ids)s`) in the same query.
It's new. A good habit I've developed recently is to read all the PEPs since the last thing, and all the library changes. Python 3 is in full swing and some great things are in there already
You should check out the typing stuff in 3.8. Combined with mypy, gets a lot of the way there to statically checked python without the hassles of statically typed code.
Nice idea, but this pattern hides your queries away from the developer. In practice, developers need the query transparent and not hidden in a file somewhere away from the code they're reading. Otherwise you're building a pattern for using queries without knowing the query's cost. If you need raw SQL, use SQLAlchemy Core and just don't use the ORM features.
I use roughly this approach in Java, and I would argue that it makes the queries more available to the developer.
Yes, they're in a different file. But it's another file in the same codebase, and it's directly referenced by the code that's using it. This means it's not really any harder to track down than the code to a function that lives in a different file. It's really NBD.
And, in return for putting your SQL in .sql files, you get all sorts of nice things. Syntax highlighting for your SQL, for starters. And it's somewhere that autoformatters and linters and SQL unit testing tools and the like can get at easily. And you get SQL code that isn't being forced through whatever laundry wringer and or cheese grater is imposed by your language's string literal syntax. (This last one admittedly isn't such a problem in Python.)
This really isn't too far off from why your JavaScript code is generally happiest in .js files and your CSS is generally happiest in .css files, rather than making them all inline in your HTML.
> And, in return for putting your SQL in .sql files, you get all sorts of nice things. Syntax highlighting for your SQL, for starters.
Until recently I was looking for a good scheme to keep my sql in .sql files primarily for this reason. But some time in the past few releases, pycharm has started detecting SQL in strings, and if I add an instance of my DB as a data source to the project, it autocompletes table/column names, etc.
I still see some appeal to putting my SQL in .sql files, but good IDE support has given me most of the benefit while keeping it in local strings, plus some help with prepared statements that would have required some added effort if I'd moved it to SQL files.
I definitely see an advantage there. But I still tend to prefer non-IDE solutions for things like code formatting and linting, because it means you have the option to configure them once in the build settings and forget about it.
With IDE-based solutions, even if you can guarantee everyone is using the IDE (Which is already something I don't even want to guarantee. Friends don't make vim-loving friends use heavyweight IDEs. But I digress.), you still have to also remember to get everyone to set their IDE up properly, and verify that they did it. Otherwise you have a tendency for lint to collect unnoticed in the code for a while before it becomes obvious that someone missed a step in the onboarding checklist.
I stick the *.sql files in the resources, and then created my own helper functions to grab them by filename.
I was very new to Java at the time, so I had to spin my wheels on wrapping my head around how Java manages resources for an afternoon. It would have been done in no time at all if I had known about the Reflections library at the time.
Sure, with SQLAlchemy Core you might start out by adding some `conn.execute` statements where necessary. But, inevitably, you'll want to use some queries in multiple places. Some of them you'll even want to use in multiple different files. Your schema will surely change over time, and your queries along with them. It would be a shame if you needed to hunt for queries across all of your files and make the necessary changes to each copy of your queries which need to be change.
It seems much more convenient to have some part of your code base, a separate module, which is dedicated to communication with the database.
The rest of your code communicates with that module through function calls and doesn't need to worry about how it's implemented. Realistically, you'll probably always use a database, but this does make it much easier to switch between databases, or decide you want to use a flat-file instead, or even use dynamodb for some high-traffic part which doesn't need much consistency.
Once you have a module, it's incredibly natural to de-duplicate queries. Instead of writing out each query once for each time it's called, you can stick it into a helper function and call that function from many places. You can organize queries by their purpose or which tables they use, making it easier to change all the queries impacted by each schema change.
In a world where you have large queries SQLAlchemy lets you give meaningful names to subqueries and different expressions in the larger queries, making them easier to understand. If you share some subqueries between queries... that module dedicated to working with your db starts to make a lot of sense!
In all, I can think of a lot of situations where "hides your queries away from the developer" is a cost out-weighed by many other benefits.
> It seems much more convenient to have some part of your code base, a separate module, which is dedicated to communication with the database.
That module doesn't have to be .sql file. You can organize your code in whatever language to achieve what you are describing. And have IDE help with Jump/Peek definition.
How is this different from having a repository of queries somewhere - this is basically a query repository but not in the active language in fact.
Your queries should be hidden away from the developer - I mean obfuscation is never the goal but separation is... If your query is written inline in some business logic that section of code has poor tests and is unreliable. SQL is complex, I absolutely adore it but it's not simple - keep it isolated from the logic of your system and, if possible, use a layered architecture approach that allows the entire persistence layer to be detachable.
> How is this different from having a repository of queries somewhere - this is basically a query repository but not in the active language in fact.
Looks like they do fancy stuff like template substitution &etc instead of just opening a text file and feeding it to the sql engine.
I was looking at it and thinking "why not just use jinga?" but then you wouldn't get the 'query manager' object and name spacing (and probably other features I've missed).
It actually seems a fairly good way to mix languages, I haven't looked at the code to see what they're really up to but I like the concept.
Sorry - is it because you don't analyze the queries even though you write them (and thus are responsibly for optimization) that is the issue? Or do you have the same complaints about .NETs Entity Framework?
Yes, this feels like it reinvents prepared statements in a less portable way. It kind of goes against the spirit of their opening justification for using it: "SQL is code, you should be able to write it, version control it, comment it, and run it using files." If that's the goal encourage the use of a way to leverage native SQL that could run on any platform or engine.
Looks kinda nice, but can it handle "IN (...)" statements correctly? Or conditionally adding some WHERE param? If not, this is rather simplistic and couldn't replace writing inline queries, and then I think I'd rather not mix together 2 approaches.
I've built pre-parsing layers for queries that can explode out params properly for IN() handling - though depending on your SQL variant you can also get away with string passing an array to get around that.
Conditional WHERE seems unapproachable with this tool - which is one of my doubts about the need of a tool like this.
If you instead approach your SQL by having each query wrapped in a function in an isolated part of your code base you, as a company/team/whatever, can choose how much logic to let reside inside of those functions - building up conditional WHERE clauses is a very common thing to need to do.
Take a look at JinjaSQL (https://github.com/hashedin/jinjasql). Supports conditional where clauses as well as in statements. It uses Jinja templates, so you get a complete template language to create the queries.
SQL should have been in .sql files from the start. SQL in string literals is an industry-wide decades long aberration. I fixed this problem for C# with QueryFirst. I'm delighted to see all the support here the approach, which is clearly the way forward for data access.
I really like this approach because as you said, it gets really messy to do string replacements inline in a python file.
I'm curious if any people here are comfortable using stored procedures as an alternative to this.
Stored procs give you the benefit of your sql being easy to change in a sql editor, but you also get query planner caching of the results which means it will likely execute faster than string replacement inline sql.
I think it really depends on the project. Stored procs are really nice for a lot of different use-cases, but one major downside is that they can be challenging to version control.
Storing this sort of stuff outside of the database with the Python project itself would make version control trivially easy.
Almost, but not exactly. If you import a Python file you get exactly what's in that file. A stored proc is not tied to a file- you can change the file and forget to update the proc, or change the proc without updating the file.
I'm relatively new to Python and I was looking for something like this. I do wonder if there is a well-known Python equivalent of Dapper: https://github.com/StackExchange/Dapper
For query building SQLAlchemy can do that and more.
I think that, with JSON support in most RDBMS, ORM as a concept has become way easier to handle. I come to think that this is the promise of the 90's Object oriented databases being fulfilled somehow.
Hydration is not a big problem. However, Dapper does much more than that: it handles multi-mapping (eg mapping users and posts from a single query), query parameterization etc. Surely, you can do it all ad hoc, but Dapper provides very convenient set of primitives out of the box.
I'm a fan of using raw sql like this but curious if anyone has solved deduplicating common statements and dynamic queries in a good way.
For example:
1) Imagine lots of queries having to show the results in the context of a user and therefore use the same JOIN and WHERE clause all over. Not being DRY, this breaks down when having to change the clause at all.
2) Imagine a reporting page that allows for filtering and ordering by different columns and therefore need some way to compose the final sql.
Don't query builders already let you do that? By making the WHERE clause map to a data structure, you can just duplicate the data structure to duplicate the WHERE clause e.g. https://bokwoon95.github.io/sq/query-building/reusing-where.... (full disclosure: I'm the author).
Total agree and I use query builders all the time. (Yours definitely looks nice.) I was curious though how others might of handled these issues when using libraries like the one posted here which is just raw sql.
+1 for query builders being an excellent choice for a lot of applications and uses. Keeping things DRY being one. We haven't solved the repetition in SQL with aiosql. We also haven't tried. :)
I've always wanted to build a larger app based on this approach. I quite like Massive, and 5 years (!) ago I wrote https://gist.github.com/grncdr/4555208#file-sss-py as a one-off response to a tweet from Ted Dziuba (is he still around?)
Funny to see on HN just started building a version of this that actually generates functions from sql queries so you get all the IDE goodness and sightly smaller runtime overhead
There are many opportunities to grow this, e.g. transforming the SQL files into stored procedures, and being able to lint and check against a DB schema definition. However, I'm not sure that the indirection step of hiding the SQL away from the code actually makes sense in the long run. A tight coupling allows filters and other optimisations to be easily added to the end of a query, and saves one lookup that is not yet supported in your editor.