One should be careful about these mega-interfaces that define an entire subsystem of your application (DB in this case). They are very painful to maintain, and add almost no value. Interfaces should always be defined by the consumer of the interface, not by the producer. And interfaces should never be used until there is a clear need. CodeReviewComments covers the basics: https://github.com/golang/go/wiki/CodeReviewComments#interfa...
I wrote an insanely long rant about this with thorough examples using the structure of the article, but it was too long for HN. So I put it on my blog instead: https://jrock.us/posts/go-interfaces/
Fair criticism. I'm the author, and I hate using interfaces like this, but didn't comment on it probably because I got used to it for this kind of thing.
I haven't read your article yet, but will do so later and get back to this subject and tell my opinion by the end of the day (on vacation, and need to hurry to catch the train!).
Testing is good, but probably not a good enough reason to add unbounded ongoing maintenance costs. The problem is that putting everything in one interfaces forces you to implement the test version of DB every time you add real functionality to the real implementation. That gets pretty expensive in terms of time, and forces you to do work that's not necessary. (You could always test using the real implementation, and that's the approach I'd recommend starting with. When it gets too slow, look for an alternative.)
My blog post covers what unit testing is like with your interface, with a consumer-defined interface, and just putting some test data fields in the DBImpl struct. The latter is my preferred approach. It took me a long time to get there. I was working at Google writing some filesystem access code and wanted to test it. I designed it as interfaces so I could have a real implementation and a test implementation, and a member of the Go team reviewed the code and told me it was absolutely inappropriate to use interfaces there. I disagreed at the time, but I've slowly come to terms with this style. Separating out production and testing is "clean", but that cleanliness comes at the cost of a mental abstraction burden (every time you read the code you're confused about where the meat lives) and unbounded future maintenance costs (every time you want to add a new feature, you have to edit two files).
I don't think having a mock framework do codegen to make generic mocks is that useful. Just keep the exact behavior close to the test. A working test harness for a particular unit is something small enough to write out in an HN comment, and keeping the test code close to the test (and not wrapped up in some complicated framework) makes it much easier to enhance and debug the tests. (I've had Mockery forced upon me before, and I'm miserable every time I have to interact with it. Have to get the right version. Have to update an interface after writing the real implementation. Have to regenerate the mocks. Then I have to refer to the docs to do even the simplest tests, and those tests end up not doing what I want (instead producing something useless like "assert that function was called with the right args"). It's a chore that isn't worth the cost.)
I think the underlying psychology behind the mega-interfaces is probably that we come up with an idea for what the interface should look like, and want to write it down somewhere. Then we can disengage the brain and start filling out the method bodies. That is not a good enough reason to add a layer of abstraction to the code; you can always type out 'func RealImplementation() { panic("write this") }' if that's the real reason.
My view shifted when I was trying to use the OpenTelemetry library (circa v0.6.0) to emit Prometheus metrics. I wanted to set the histogram buckets on a per-metric basis. I chased 100 layers of indirection (every function took and returned an interface, so you'd need to have like 5 code windows open to debug the flow of data between the interface level and the implementation level), and never figured out how to do it. I think I determined it wasn't possible. It was at that moment that I decided "wait, I think everyone is using interfaces wrong", and then found CodeReviewComments telling me "yup, it's wrong".
For testing I wrote https://github.com/jamesrr39/go-mockgen-tool to generate mock implementations of a given interface. It's a different approach from the normal reflect and interface{}... heavy libraries. Might be interesting!
Quite reasonable! I think if you keep your interfaces small, you might not need it, but if you do have a couple of beefy interfaces, then this is a lot more pleasant than using a dedicated mock framework. A function is a great way to mock a function -- you can do anything! ;)
Thanks for giving it a readthrough! Just wanted to mention that even though I'm critical about this one thing, your article as a whole is superb. If people start their Go projects using pgx, tern, and have tests for their database functionality, they're going to be set up for a lot of success. The interface thing is just a minor annoyance I have because I see so many people doing it. (I'm sure my past self has done it more times than I'd like to admit.)
I agree with the basic idea in your blog post -- that interfaces should be defined by the consumer, and be small -- but I think it's a bit dogmatic: "it’s wrong". :-)
For one thing, it's easy to work around the issue of "pasting every single function signature into this test file, and making it panic("not implemented") or something". You can simply embed the DB interface in your test struct, and only override the method(s) you need. For example, let's say the DB interface defines 10 methods, but our function under test only need one, GetThings(). We do this:
type testDB struct {
DB // embedded field named DB, so testDB implements DB
things []Thing
}
// but override GetThings method
func (d *testDB) GetThings() ([]Thing, error) {
return d.things, nil
}
The embedded DB field, which is a field named DB of type DB and is nil, pulls in all the methods of that type so testDB implements the DB interface. When you construct a testDB you just leave the DB field nil -- if any of the not-explicitly-defined methods are called, they'll panic with a nil pointer, but that's okay, they would explicitly panic in your test implementation anyway.
The other thing is simply pragmatism: at some point it gets tedious to define all these tiny interfaces. Do you do it at the handler level, where each db sub-interface only has 1 or 2 methods? This blows out the number of interfaces. Or do you do it at the server level, where the db interface has all the methods (what you're opposed to)? Or something in between: have server "sections" grouped by logical feature, such as authDB, projectDB, and so on. Each might have 5-10 methods, and to test each sub-server you'd either do what I've suggested above with an embedded interface field, or just implement a single in-memory test db for each db type.
I do like what you said about interfaces obfuscating the code (it becomes more abstract and hard to follow and navigate), and that testing on the real db if you can is a good idea. Creating a new PostgreSQL db for every test seems like it'd be very slow and unnecessary, though. Why not just one db for each test run?
Thanks for writing this up! This is something I've struggled with in my Go projects, and I didn't know a good way to solve it. Your writeup and the reference to the go wiki was very helpful.
oh, I also have defined database.PGX (https://github.com/henvic/pgxtutorial/blob/main/internal/dat...), which is a little more scarier. I'll get back about that too, but this one is perhaps even more useless, and there might be a better way to solve the issue I want to try to avoid with it.
Yeah, I think in that case, just make a decision as to whether or not you're going to use Pool or Conn in your application code. If you change your mind, just refactor; it's not worth maintaining the interface just to open up the opportunity to change in the future. Pool is probably the right choice forever.
If it's for testing, I'd just skip mocking that out and run the queries against the real database, like you do in the article. Production is going to connect to Postgres and run `select * from foobars`. Might as well run `select * from foobars` in the tests, so I'm not surprised when it doesn't work in production.
It wasn't for testing. I wanted to expose only a limited set of *pgx.Pool that was "safe" to be used from any place. For example, without the Close() function that might be called during gracefully shutdown. I wanted to avoid exposing such things as kind of verification process, and that's a little bit silly.
That's a good reason. I think I would prefer adjusting my linter to detect the unsafe uses. That way, if someone really wants to, they can "// nolint:unsafe_db See discussion in ISSUE-1234".
I didn't look closely, but I think that just wrapping the native struct in an interface without the method someone wants to call will just lead them to `wrapper.(*pgxpool.Pool)` and calling the unsafe method anyway.
> I’ve had a bad experience with ORM in the recent past, and I generally recommend against introducing this sort of abstraction to code.
Yeah I don’t get ORMs.
Most ORMs advertise “type safety” and not having to learn SQL. But actually you’re just writing SQL code in a non-SQL language. Usually the type-safety is weak, and a good IDE will provide type-checking and even data-source checking to raw SQL anyways.
Converting complex object operations into relational operations is non-trivial. ORM objects auto-query and auto-update from the database, but these queries/updates are often unoptimal or happen at invalid times. It’s much easier to write and execute the SQL queries manually, and then auto-convert the results to/from regular objects.
> write SQL queries manually, and then auto-convert the results to/from regular objects.
In any meaningul sized codebase, converting to/from objects will be used all over the place. Then someone adds an abstraction on top, and congratulations - there is now an inhouse orm.
That's not the same thing as ORM. ORM constructs queries based on arbitrarily complex request. Abstraction layers that we add on top of existing 'handcrafted' SQL queries is often expected and necessary. With base foundation rock solid, explicit and unchanging – you can build abstractions safely and with peace.
2. This is why I'm pushing so hard to keep things with plain SQL at work. So far, another team has experimental with squirrel (a SQL query builder) and it looks sane, but I myself wouldn't introduce not even this.
In my previous company, I was forced to use an outdated forked version of gorm. I'd just do my best to do anything else other than leading with databases there. Running the test suite took minutes, and to run way fewer test cases. It was a nightmare.
> Most ORMs advertise “type safety” and not having to learn SQL. But actually you’re just writing SQL code in a non-SQL language
I think you're doing it wrong if you end up doing really complex queries in your ORM. It's kind of a judgement call, but at a certain point, you need to switch over to SQL. Good ORM's let you do that without too much hassle.
I wrote a django app using postgres full text search. I ended up going with direct sql calls for anything more than a single record get because of all the specialized syntax. It's nice to keep stuff like new record creation clean, but once you get more than one join deep it's time to break out the best domain specific language ever created.
Well, sure. It makes it easy to do simple stuff and gets out of the way when you want to bypass it rather than being from the "sorry Dave, I can't let you do that" school of interface design.
The only good ORM I've used is Rail's ActiveRecord. Sure there are gotchas you need to know about, maybe it's just experience, but I find these to be a non-issue.
Every other ORM in other languages I've tried seems to have WTF moments. It could be that Ruby is a dynamic enough language that it allows you to work around these things easily, that stricter languages do not.
One benefit of using an ORM is that you don't write SQL strings, you write statements and method calls. So you get better IDE support, and detect syntax errors before they hit the database. If you are just writing a CRUD application, maybe with one or two `joins` or `order by`, then an ORM is perfect. If you are doing anything more complex, then by all means drop to plain SQL.
The real question to me is, why can't we drop the relational database? In many small projects (where you would use an ORM) it is overkill. Why can't I create a list of plain-old-data objects in my language, then add indicies on that, and do fast queries directly in the language? Most languages already have a data structure with an index on one key (a dict, hashmap or similar). With C#'s LINQ or Pythons list comprehensions we already have half of the solution.
From what I understand, data duplication can lead to consistency errors. And if you want to avoid data duplication, you're going to rewrite a relational database yourself, since you would want to put your data into normal forms to avoid duplication. I think I've read a few times on Hacker News that we already went through the alternatives to the relational database around the time of its birth, and that a few papers were published at the time on why this was a better solution than others.
> a good IDE will provide type-checking and even data-source checking to raw SQL anyways
I've never in my life heard of this, which IDE are you talking about? I'd love to use something like that! I'm in C# land where I can use Management Studio to write queries which works well enough, but going between that and Visual Studio does have some friction.
Even though we use Entity Framework here, I mostly use it to manage migrations than to work with data. Most of the time unless a query is really trivial, I'll just write some SQL and use Dapper for object mapping, I find it a lot easier once the query gets to be over three or four lines or so.
IntelliJ IDEA. For example in Java if you put // language=SQL before a string literal or @Language(“SQL”) before a function parameter at declaration and call with the string literal, the literal text will be highlighted. You “configure SQL dialect” and “configure data source” to get it to point to your database
> auto-convert the results to/from regular objects.
Sadly this functionality, which I think is not much more complex than a JSON parser, is absent from many toolkits / libraries, and an immediate option people see is an ORM. Libraries that just do serialization / deserialization from SQL are much less popular than something like Hibernate, which could be a reason.
The difference is you’re still writing SQL keywords, just as functions
e.g. userPosts = Posts.InnerJoin(Users).SelectAll().Where(user => user.id == userID)
It’s not always exactly the same, but the point remains that you have to understand SQL anyways and the code usually isn’t any clearer.
Maybe i’m confusing this with query builders. ORMs also let you form queries by interacting with objects directly, e.g. ‘user = Users.Find(userID); userPosts = user.Posts’. But that leads to the other issue with inefficient and unpredictable queries.
As a person that writes my own SQL and hasn't had much/any use for ORMs in the past - I have the imagination that if you were using an ORM you could switch the underlying database without a lot of trouble.
Yes and no. I've used Django and Django's ORM for many years (along with others, but Django is probably the easiest for me to discuss).
Django's ORM doesn't stop you from using database-specific features (JSON fields, full text search, PostgreSQL's geo extensions, etc.) It also makes it very straightforward to bypass parts of the ORM and write your own SQL either just as the WHERE part of a query or as the whole SQL statement. If you build a large enough app that deals with enough data, you'll eventually find some hot spots where you need to do that for performance reasons. When you do that, you obviously can lose portability.
If you avoid those things though, you pretty much can switch the database. In practice, I've never really found a need to do that for a production system, but it still has some major advantages. First, I regularly run PostgreSQL in production, but I can use SQLite as an in-memory database for unit tests, which makes those much faster (and simpler to run unit tests without having to also spin up a full DBMS).
The other huge advantage is that it enables Django's whole ecosystem of reusable applications. Eg, the built-in admin interface, popular third party applications for authentication, tagging, CMS-type stuff, etc. are generally written using the ORM and as a result can be added to any Django project with a couple lines of config and will work no matter what database you use. That large ecosystem of fairly polished components has been a major reason that Django has remained so popular for such a long time. If it had to be split up by supported database ("MySQL admin" vs "PostgreSQL admin", etc), I don't think it would've been as successful.
I think the database change thing is way overrated. I'm 12 years in the same company, I juggle with Postgres, MySQL, SQL Server (and Oracle previously).
Most of the databases I access are from external providers. We changed our main provider 2 years ago, so their DB went from Oracle to SQL Server.
So in 12 years that would have been one database change. Except all the schema has changed and they don't allow SQL queries on it but provide good old SOAP web services...
So even if I had used a full ORM (I use Dapper, you write the SQL, it "just" converts the result of the query to objects) I still would have to change my code to adapt to the new provider.
However going from SQL to webservices really convinced me SQL is a superpower. With it you get exactly the data you need, no need to re-process it in code.
Before this change I never had to really look at my apps performances, it was good enough. But since this provider webservices are so inefficient (for example I can't "query" a single customer, I have to get all of them) I had to optimize my app in every possible way to mitigate these inefficiencies.
I assume it would be a bit easier because the query and ORM syntax is uniform across databases. And it also probably wont support many database-specific operations.
But it would still be work. Some databases don’t support basic operations (e.g. UPDATE with LIMIT in postgres) which you can still do with the query syntax. And different databases usually have different preferred (usually faster) ways of doing indexes, joins etc. Plus you still have to migrate your data.
ORMs are ideally supposed to be a complete abstraction where you can pretend the database doesn’t exist, so if you switch it shouldn’t matter. The problem in, in practice you can’t pretend the database doesn’t exist. So you either end up writing code for your specific dialect anyways, or you write inefficient code.
It’s also like, you can just write plain SQL and ignore your dialect’s extra features. And that would make migration much easier, but it would also be less efficient and much harder to write.
Great article! These days, I find myself reaching for sqlc whenever I need to access Postgres from Go. My experience with the project has been excellent: I get type-safe APIs and control over exactly which queries are executed without much fuss.
Dumb question: how do you write dynamic queries using sqlc? From the doc I see:
-- name: ListAuthors :many
SELECT * FROM authors
ORDER BY name;
But what if I want to get a list of authors by date? Shall I do:
-- name: ListAuthors :many
SELECT * FROM authors
ORDER BY name;
-- name: ListAuthorsByDate :many
...
And what if later I need to get a list of authors by date and name? Should I write yet another query? I'm used to write one generic (composable) query that is very handy in situations where one need to retrieve rows by many different filtering criteria. Is that possible in sqlc?
Dynamic queries are difficult to model when starting from a static query since SQL doesn't compose well. The workarounds I've used:
- Use multiple queries that share the same output type.
- Push the predicate into the query directly.
However, you can't pass an expression for an order by column since the order by clause takes a name, not an expression. Postgres doesn't allow using names as arguments to a prepared query so that leaves either adding an annotation like sqlc.order_by that's dynamically added to a query string, or by getting more creative with the structure of the query:
SELECT * FROM AUTHORS WHERE sqlc.arg('by_date') ORDER BY DATE
UNION ALL
SELECT * FROM AUTHORS WHERE NOT sqlc.arg('by_date')
Thanks! I love the concept of sqlc, but their support for pgx with its native PostgreSQL interface was quite limited when I checked. I imagine it's better nowadays, and will try again soon!
pggen occupies the same design space as sqlc but the implementations are quite different. Sqlc figures out the query types using type inference in Go which is nice because you don’t need Postgres at build time. Pggen asks Postgres what the query types are which is nice because it works with any extensions and arbitrarily complex queries.
The support for native pgx in sqlc is good now. I migrated my project from database/sql to pgx recently and posted the instructions at the end of this post: https://linklonk.com/item/1948319982853095424
Yes! I’ve been using sqlc both at work and on side projects and it works awesome. For me it strikes the right balance between “I want to work with sql/db abstractions” and “I don’t need orm, just automated mapping”. Has a few rough edges when advanced types needs to get mapped, but it’s getting there.
struct Country { country: String, count: i64 }
let countries = sqlx::query_as!(Country, "SELECT country, count FROM countries WHERE organization = ?", organization)
.fetch_all(&db_pool)
.await?;
Postgres is awesome technology but for a lot of applications SQLite will do the job just fine. This gets you a single binary with zero external dependencies.
You should check out https://pkg.go.dev/modernc.org/sqlite -- it's an (automated) conversion of SQLite's C source to Go so you can use SQLite without CGo. Apparently it passes all the SQLite tests. I've used it in a (toy) project, and it seems to work well.
Digression: what's your usecase for having multiple processes share the same database ?
SQLite is not an equivalent to a standard networked database, as a developer you're supposed to see it as an enhanced data structure, like an array++. You wouldn't share an array between multiple processes (unless you know exactly what you're doing) so you wouldn't do the same for SQLite. If you really need multiple processes writing the same kind of data, maybe it should be partitioned into multiple independent files ?
If by persist to disk you mean "when you're done", it can by replicating the db to a file. Another option is to back it up over the network, e.g., https://github.com/benbjohnson/litestream
SQL is a very nice querying language, if you can stuff your data in it and run structured queries instead of some spaghetti code it can help a lot. This is the same domain as redis initially, ie a "database" with helpful structures and data types for simple data mangling
I keep seeing this comment on HN, but I never really get it. In the world of platform as a service, launching a database instance is far easier than managing disk based storage yourself. It also has a very obvious route to horizontally scaling the web process. Deployment is easier too with an external database, there's no data on the disk of the machine running the web process that needs to be carried over.
I had the same objection to SQLite, and then I heard about Litestream, and it won me over.[0]
Litestream watches your SQLite database and then streams changes to a cloud storage provider (e.g., S3, Backblaze). You get the performance and simplicity of writing SQLite to the local filesystem, but it's syncing to the cloud. And the cool part is that you don't have to change any of your application code to do it - as far as your app is concerned, it's writing to a local SQLite file.
I wrote a little log uploading utility for my business that uses Litestream, and it's been fantastic.[1] It essentially carries around its data with it, so I can deploy my app to Heroku, blow away the instance and then launch it on fly.io, and it pops up with the exact same data.[2]
I'm currently in the process of rewriting an open-source AppEngine app to use SQLite + Litestream instead of Google Firestore.[2] It's such a relief to get away from all the complexity of GCP and Firestore and get back to simple SQLite.
> launching a database instance is far easier than managing disk based storage yourself
Look, it comes down to what you, the developer are comfortable with. If you know and are comfortable with running something like postgres in the cloud, then go for it.
Some of us prefer to not. Similarly, not all of us nuke the entire filesystem on every deploy :)
The state of Go PostgreSQL drivers is pretty bad IMO. Everyone recommends pgx, but it has a few rough edges (like how it handles dates/time) and a lot of surface area. The other driver, lib/pq, isn't really being maintained (for no good reason, it's quite good, simple and works well with sqlx).
I don't know the state of drivers for other ecosystems, but even though pgx has a few rough edges, the quality of the driver is quite good and the maintainer (jackc) is also super responsive and helpful.
Our issue with lib/pq is that it doesn’t respect context deadlines/cancellation - e.g. a long running query will keep running till its done (in postgres), even when the context and all the callers have been long gone. We had quite a few issues with this and RDS eventually exhausting (burst) balances, driving the performance of the rest of the system to the ground.
The article mentions that lib/pq is "effectively in maintenance mode". This surprised me, I've never had problems with it, and looking at the commits on github [1] it seems far from inactive.
Thanks, I missed that. It is a shame, because although pgx is more capable, it doesn't seem to work with Go's database/sql system, which it harder to support multiple database backends (e.g. Sqlite for testing).
I dislike this because you're assuming I've Docker on my system and you also have to deal with the startup time of the container.
I'd rather manage the database separately, and just have it configured for my tests to use them. This is why I kind of pushed towards a solution that didn't involve Docker.
Someone might think: "oh, but what [web] developer doesn't have Docker installed?"
Well, I dod, but I always have problems with it, so I barely use it.
I also prefer to run my database on a homelab server (https://henvic.dev/posts/homelab/) instead of on my laptop, especially because I also need OpenSearch or Elasticsearch running, and it's better if I just offload the performance|battery hit.
Funny is that when I joined the company back in February I was asked what computer I'd like to receive, and I really wanted a 13" MacBook with the new M1 processor, so I asked for it, but in the end I decided to go with Intel as I imagined using Docker would be very important.
It turns out that the M1 would work just as fine for the type of work I've been doing since then, with some minor exceptions.
I wrote an insanely long rant about this with thorough examples using the structure of the article, but it was too long for HN. So I put it on my blog instead: https://jrock.us/posts/go-interfaces/