Oh lord this looks terrible. Let's reinvent a bunch of wheels and maim the datab...

mike_hearn · on Sept 20, 2023

Let me try and explain why it's at least interesting, better than the website is doing, then.

Firstly, why separate things into a pluggable KV layer and then object persistence on top? There are some advantages to doing things this way.

One is it lets you use a single technology at every level of scalability. Want to store a few objects for your desktop app settings file? Use a KV store that's dirt simple (can even be text files). Need to store billions of objects? Plug it into FoundationDB or Spanner or some other highly scalable backend. Generate a tiny in-memory transaction containing a small object sub-graph, serialize it to a protobuf-sized message and then send it over the network? Use a TreeMap. Want a single process but ultra-fast store? Use RocksDB. All those those are easy, and the API remains the same. Because (sorted) KV stores have pretty uniform interfaces you can even do tricks like incremental online migration by composing multiple backends together, or get notified when specific object fields change, or add a caching layer, and again the API can be uniform.

The closest to this in the RDBMS world would be trying to migrate between SQLite and, say, Oracle or some cloud pseudo-RDBMS, probably relying on another large abstraction layer on top to try and cover up the major differences between the backends. And then importing yet another giant library with a different set of data definitions and protocols with their own limits to handle object serialization outside of the database.

During development and testing you can run against a local or in-memory database, then switch to more powerful backends for the final rounds of testing. Normally in Java this requires something like H2 but then you're using a different DB entirely and it causes a whole round of new problems.

As a concrete example of where this is useful, Permazen was written for a 911 call dispatching system in the USA that required multi-master synchronous replication between geographically distributed sites that could tolerate connectivity failure between any of the masters and still allow each site to function. Adding this sort of feature is fairly easy because Permazen comes with a Raft backend, and has ways to add hooks for reconciliation of transactions. The whole stack is or can be Java when done this way, which makes for a very transparent stack.

Secondly, why use "language-integrated persistence" as the paper calls it?

SQL and RDBMs are, as I think you'll agree, powerful but very complicated technologies. SQL is a full blown programming language. If you're comfortable with jOOQ then that means you've mastered both Java and SQL and jOOQ and maybe the SQL dialect of your specific database as well, which is a lot. Plus subjects like schema setup and migration. Permazen asks a question about simplification: what if you just needed your knowledge of your primary programming language as well as a few database basics (like what indexes are)? What if that was enough to solve many problems? You've already accepted that this is valuable at some level because you're using jOOQ that adds something like this on top of Java+SQL, but jOOQ is a huge library. In Permazen you're using APIs from the standard library like NavigableSet, Stream, along with a few helper utilities for things like set intersection. Additionally, because you're relying on the host language more there are way fewer APIs to learn. This may be more intuitive for some developers, especially those that don't use SQL all the time.

Writing queries out as functional maps, filters and folds may seem awkward compared to having an RDBMS try and work it out for you, but there are reliability and predictability advantages. Some RDBMS engines have a problematic failure mode in which query performance can change drastically in production without anyone actually changing anything, as the statistics of the underlying table shift and suddenly cause a change in the query plan. It's also easy to write queries in SQL that don't have the expected performance. There are many stories on HN and elsewhere of heros swooping in to save some company or other through the application of a judicious CREATE INDEX command. In Permazen, because you're writing out the sequence of steps to get the data, it's always clear from reading the code what the query performance will be, and that performance will be stable over time.

Finally, as you note, due to the object/relational mismatch JPA is a very complex technology yet lacks some features regardless. If you want to work with object graphs then Permazen does provide a much cleaner API. It also supports useful features like online schema migrations - becoming unable to change tables without downtime is a major problem with some RDBMS.

All that said, it must be noted that Permazen is the personal project of a ex-FreeBSD hacker. It runs in production successfully for years as part of a larger contracting project, but it's not a large commercial project. It's best thought of as the starting point for a conversation about persistence rather than something that's going to obliterate the competition tomorrow.

lukaseder · on Sept 20, 2023

> This may be more intuitive for some developers, especially those that don't use SQL all the time.

I tend to recommend my famous talk to such developers: https://www.youtube.com/watch?v=wTPGW1PNy_Y

mike_hearn · on Sept 20, 2023

Nice talk, you're a great speaker! And BTW I also love jOOQ :)

Your example translated to Permazen Kotlin (for concision) would be something like this:

    data class IncomeByDate(val film: Film, val date: LocalDate, val income: Long)

    films
        .flatMap { film -> film.rentals.map { IncomeByDate(film, it.date, it.amount) } } 
        .groupingBy { it.film } 
        .reduce { _, l, r -> l.copy(income = l.income + r.income) } 
        .values 
        .sortedWith(comparing<IncomeByDate, String> { it.film.title }.thenBy { it.date })

That's not using a convenience library like your jOOL.

You can argue against this in a bunch of ways. We could debate readability, the fact that it's not Java (doesn't matter technically) or that maybe the RDBMS parallelizes operations. We could add parallelism with a .parallelStream() in the right spot easily enough. But the query is around the same length as the SQL and reads in a similar fashion.

You discuss this a bit later but then say, look how much time we've spent! Sure, it comes later in your talk, but that doesn't equal time spent. You're comparing a SQL query you presented fait accompli, vs half a talk of iterating on the Java.

I think you probably overestimate how easy SQL is because you're an expert in it. For occasional users like me, it can be a quirky pain. Even the way strings work is unintuitive. But we know the standard libraries of our languages pretty well, we have to, it's required for the job. Your whole product is built on the fact that SQL isn't good enough, there's a lot of problems that remain unsolved when you just use SQL. Otherwise jOOQ wouldn't exist.

That's before we get into the different properties of the backends, e.g. horizontal read/write scalability for free (FDB) vs RDBMS, incremental online schema evolution and so on.

lukaseder · on Sept 20, 2023

I don't know if your various flatMap / etc methods are purely implemented in the client (it would be quite bad from a performance perspective? But since you're implementing reducers in kotlin, I guess that's what this is), or if you somehow translate the AST to SQL (similar to jinq.org in Java or Slick in Scala or LINQ in .NET).

But in either case, I think that mimicking "idiomatic" client APIs is more of a distraction than something useful. I've explored this here, where I was asked about my opinion on Kotlin's Exposed: https://www.youtube.com/watch?v=6Ji9yKnZ3D8

Obviously, this is ultimately a matter of taste, but just like all these "better SQL languages" (e.g. PRQL) come and go, these translations to "better APIs" also come and go. SQL is the only thing to stay (has been for more than 50 years now!)

> We could add parallelism with a .parallelStream() in the right spot easily enough

You typically don't even need to hint it, the optimiser might choose to parallelise on its own, or not, depending on production load... Anyway, that's an overrated topic, IMO.

> I think you probably overestimate how easy SQL is because you're an expert in it.

I'm happy when coding in any language / paradigm. When working with XML, I will happily use XSLT, XPath, etc. When working with JSON, I don't mind going into JavaScript. I'm just trying to stay curious.

I really don't think that SQL is "harder" than any other language. It may just be something certain people don't really like, for various reasons.

> Your whole product is built on the fact that SQL isn't good enough

I think you're projecting your own distaste into my work here. I love SQL. SQL is wonderful. It's quirky, yes, but what isn't. jOOQ users just don't like working with an external DSL within Java (though many are very happy to write views and stored procedures with SQL and procedural extensions). There's no need for a false dichotomy here. I've worked on large systems that were mostly implemented with SQL written in views, and it was perfect!

Also, jOOQ is much more than just the DSL. SQL transformation, parsing, etc., it's a vast product. The string-y SQL folks could use it as a template engine, without touching the DSL, and still profit from parsing / transformations / mapping: https://blog.jooq.org/using-java-13-text-blocks-for-plain-sq...

Some customers use jOOQ merely to migrate off Oracle to PostgreSQL (via the translating JDBC proxy).

And I'm looking forward to the OpenJDK's Project Babylon. Perhaps we'll get actual macros in Java, soon, which could work well with jOOQ.

Anyway, I didn't mean to hi-jack too much. It's great when people try out new / different approaches. I'm just triggered whenever someone claims that SQL is harder than anything else, when they should have said, they prefer other things and don't want to learn more SQL (which is fine, but quite a different statement).

mike_hearn · on Sept 20, 2023

Permazen is a library that runs in-process, there is no network protocol, so it expects to be close to your data. When you use flatMap then yes that's compiled for a for loop under the hood, and reads on the objects trigger KV reads. There are ways to use hinting and pre-fetching and such if latency starts getting higher, say if you use FoundationDB (which does have a protocol). But in most KV stores even those over the network, there is a local cache and you can pre-read keys close to each other in keyspace.

I guess by easy/hard, I am considering experience to be 99% of that. If you transform XML by using XSLT then that's great, but most devs won't know what to make of that, because they lack the experience. I haven't used XSLT since, I think, 2001, and if I had to use it today I'd need to relearn it from scratch. That's not hard hard, it's just time consuming, and I'd rather write perhaps slightly more verbose or less pretty code in a language that me+team already use than (re)learn a DSL in that case.

jabradoodle · on Sept 20, 2023

> I don't know if your various flatMap / etc methods are purely implemented in the client (it would be quite bad from a performance perspective?

Haven't quite groked how it works myself; the readme does call out that if your data is separated from your application by a high latency network then performance likely won't be good enough.

I'm interested in taking a look at this as I maintain apps that use local kv stores so this could complement the approach I'm already using.

gregopet · on Sept 20, 2023

Thank you for your explanation!

gregopet · on Sept 20, 2023

Personally, I found https://microstream.one/ interesting for simple persistance needs (haven't tried it, though).

drzaiusx11 · on Sept 20, 2023

I switched from jOOQ to JDBI, I find JDBIs interfaces more ergonomic than jOOQs, which was essentially just a nice SQL query builder pattern (when I last used it at least.)

ianlevesque · on Sept 20, 2023

jOOQ is pretty great, and recently got good Kotlin support. My only complaint is if you use their DSL you lose the SQL syntax highlighting JetBrains IDEs have.

selimco · on Sept 20, 2023

In kotlin (maybe also java?) you can use the jetbrains @Language annotation to get syntax highlighting on string variables. A little more verbose but maybe it works for you.

gregopet · on Sept 20, 2023

jOOQ doesn't work with string variables (I mean it can, but that's not the point of that library). It compiles your database schema into objects and then you invoke a very SQL-like API over them. The code then reads very close to the actual SQL and since the schema is encoded in the Java objects you get the usual code completion for things like columns, tables, stored procedures, indices..

What you don't get is the more advanced, entire-query-level hints, because IDE's can't tell that Java / Kotlin / ... code is actually the SQL that will be emitted if you squint just a little bit :)