Hacker News new | past | comments | ask | show | jobs | submit login
Permazen: a different persistence layer for Java (github.com/permazen)
60 points by mike_hearn on Sept 20, 2023 | hide | past | favorite | 41 comments



Oh lord this looks terrible. Let's reinvent a bunch of wheels and maim the database and end up with.. I don't know exactly what, even after reading the page for over 5 minutes.

I've tried various Java persistence technologies but this one is proving a hard sell to me. I'm currently using jOOQ which has a different philosophy, it embraces SQL and brings it closer to Java and frankly, I'm very happy with it. And sure, JPA does a few things wrong (as the Permazen comparison points out) but from what I'm reading I'd still rather use it over Permazen.

The project used to be called JSimpleDB and it should have kept that name IMHO since very simple applications for developers unfamiliar with SQL may be one of the few valid niches for it.


Let me try and explain why it's at least interesting, better than the website is doing, then.

Firstly, why separate things into a pluggable KV layer and then object persistence on top? There are some advantages to doing things this way.

One is it lets you use a single technology at every level of scalability. Want to store a few objects for your desktop app settings file? Use a KV store that's dirt simple (can even be text files). Need to store billions of objects? Plug it into FoundationDB or Spanner or some other highly scalable backend. Generate a tiny in-memory transaction containing a small object sub-graph, serialize it to a protobuf-sized message and then send it over the network? Use a TreeMap. Want a single process but ultra-fast store? Use RocksDB. All those those are easy, and the API remains the same. Because (sorted) KV stores have pretty uniform interfaces you can even do tricks like incremental online migration by composing multiple backends together, or get notified when specific object fields change, or add a caching layer, and again the API can be uniform.

The closest to this in the RDBMS world would be trying to migrate between SQLite and, say, Oracle or some cloud pseudo-RDBMS, probably relying on another large abstraction layer on top to try and cover up the major differences between the backends. And then importing yet another giant library with a different set of data definitions and protocols with their own limits to handle object serialization outside of the database.

During development and testing you can run against a local or in-memory database, then switch to more powerful backends for the final rounds of testing. Normally in Java this requires something like H2 but then you're using a different DB entirely and it causes a whole round of new problems.

As a concrete example of where this is useful, Permazen was written for a 911 call dispatching system in the USA that required multi-master synchronous replication between geographically distributed sites that could tolerate connectivity failure between any of the masters and still allow each site to function. Adding this sort of feature is fairly easy because Permazen comes with a Raft backend, and has ways to add hooks for reconciliation of transactions. The whole stack is or can be Java when done this way, which makes for a very transparent stack.

Secondly, why use "language-integrated persistence" as the paper calls it?

SQL and RDBMs are, as I think you'll agree, powerful but very complicated technologies. SQL is a full blown programming language. If you're comfortable with jOOQ then that means you've mastered both Java and SQL and jOOQ and maybe the SQL dialect of your specific database as well, which is a lot. Plus subjects like schema setup and migration. Permazen asks a question about simplification: what if you just needed your knowledge of your primary programming language as well as a few database basics (like what indexes are)? What if that was enough to solve many problems? You've already accepted that this is valuable at some level because you're using jOOQ that adds something like this on top of Java+SQL, but jOOQ is a huge library. In Permazen you're using APIs from the standard library like NavigableSet, Stream, along with a few helper utilities for things like set intersection. Additionally, because you're relying on the host language more there are way fewer APIs to learn. This may be more intuitive for some developers, especially those that don't use SQL all the time.

Writing queries out as functional maps, filters and folds may seem awkward compared to having an RDBMS try and work it out for you, but there are reliability and predictability advantages. Some RDBMS engines have a problematic failure mode in which query performance can change drastically in production without anyone actually changing anything, as the statistics of the underlying table shift and suddenly cause a change in the query plan. It's also easy to write queries in SQL that don't have the expected performance. There are many stories on HN and elsewhere of heros swooping in to save some company or other through the application of a judicious CREATE INDEX command. In Permazen, because you're writing out the sequence of steps to get the data, it's always clear from reading the code what the query performance will be, and that performance will be stable over time.

Finally, as you note, due to the object/relational mismatch JPA is a very complex technology yet lacks some features regardless. If you want to work with object graphs then Permazen does provide a much cleaner API. It also supports useful features like online schema migrations - becoming unable to change tables without downtime is a major problem with some RDBMS.

All that said, it must be noted that Permazen is the personal project of a ex-FreeBSD hacker. It runs in production successfully for years as part of a larger contracting project, but it's not a large commercial project. It's best thought of as the starting point for a conversation about persistence rather than something that's going to obliterate the competition tomorrow.


> This may be more intuitive for some developers, especially those that don't use SQL all the time.

I tend to recommend my famous talk to such developers: https://www.youtube.com/watch?v=wTPGW1PNy_Y


Nice talk, you're a great speaker! And BTW I also love jOOQ :)

Your example translated to Permazen Kotlin (for concision) would be something like this:

    data class IncomeByDate(val film: Film, val date: LocalDate, val income: Long)

    films
        .flatMap { film -> film.rentals.map { IncomeByDate(film, it.date, it.amount) } } 
        .groupingBy { it.film } 
        .reduce { _, l, r -> l.copy(income = l.income + r.income) } 
        .values 
        .sortedWith(comparing<IncomeByDate, String> { it.film.title }.thenBy { it.date }) 
That's not using a convenience library like your jOOL.

You can argue against this in a bunch of ways. We could debate readability, the fact that it's not Java (doesn't matter technically) or that maybe the RDBMS parallelizes operations. We could add parallelism with a .parallelStream() in the right spot easily enough. But the query is around the same length as the SQL and reads in a similar fashion.

You discuss this a bit later but then say, look how much time we've spent! Sure, it comes later in your talk, but that doesn't equal time spent. You're comparing a SQL query you presented fait accompli, vs half a talk of iterating on the Java.

I think you probably overestimate how easy SQL is because you're an expert in it. For occasional users like me, it can be a quirky pain. Even the way strings work is unintuitive. But we know the standard libraries of our languages pretty well, we have to, it's required for the job. Your whole product is built on the fact that SQL isn't good enough, there's a lot of problems that remain unsolved when you just use SQL. Otherwise jOOQ wouldn't exist.

That's before we get into the different properties of the backends, e.g. horizontal read/write scalability for free (FDB) vs RDBMS, incremental online schema evolution and so on.


I don't know if your various flatMap / etc methods are purely implemented in the client (it would be quite bad from a performance perspective? But since you're implementing reducers in kotlin, I guess that's what this is), or if you somehow translate the AST to SQL (similar to jinq.org in Java or Slick in Scala or LINQ in .NET).

But in either case, I think that mimicking "idiomatic" client APIs is more of a distraction than something useful. I've explored this here, where I was asked about my opinion on Kotlin's Exposed: https://www.youtube.com/watch?v=6Ji9yKnZ3D8

Obviously, this is ultimately a matter of taste, but just like all these "better SQL languages" (e.g. PRQL) come and go, these translations to "better APIs" also come and go. SQL is the only thing to stay (has been for more than 50 years now!)

> We could add parallelism with a .parallelStream() in the right spot easily enough

You typically don't even need to hint it, the optimiser might choose to parallelise on its own, or not, depending on production load... Anyway, that's an overrated topic, IMO.

> I think you probably overestimate how easy SQL is because you're an expert in it.

I'm happy when coding in any language / paradigm. When working with XML, I will happily use XSLT, XPath, etc. When working with JSON, I don't mind going into JavaScript. I'm just trying to stay curious.

I really don't think that SQL is "harder" than any other language. It may just be something certain people don't really like, for various reasons.

> Your whole product is built on the fact that SQL isn't good enough

I think you're projecting your own distaste into my work here. I love SQL. SQL is wonderful. It's quirky, yes, but what isn't. jOOQ users just don't like working with an external DSL within Java (though many are very happy to write views and stored procedures with SQL and procedural extensions). There's no need for a false dichotomy here. I've worked on large systems that were mostly implemented with SQL written in views, and it was perfect!

Also, jOOQ is much more than just the DSL. SQL transformation, parsing, etc., it's a vast product. The string-y SQL folks could use it as a template engine, without touching the DSL, and still profit from parsing / transformations / mapping: https://blog.jooq.org/using-java-13-text-blocks-for-plain-sq...

Some customers use jOOQ merely to migrate off Oracle to PostgreSQL (via the translating JDBC proxy).

And I'm looking forward to the OpenJDK's Project Babylon. Perhaps we'll get actual macros in Java, soon, which could work well with jOOQ.

Anyway, I didn't mean to hi-jack too much. It's great when people try out new / different approaches. I'm just triggered whenever someone claims that SQL is harder than anything else, when they should have said, they prefer other things and don't want to learn more SQL (which is fine, but quite a different statement).


Permazen is a library that runs in-process, there is no network protocol, so it expects to be close to your data. When you use flatMap then yes that's compiled for a for loop under the hood, and reads on the objects trigger KV reads. There are ways to use hinting and pre-fetching and such if latency starts getting higher, say if you use FoundationDB (which does have a protocol). But in most KV stores even those over the network, there is a local cache and you can pre-read keys close to each other in keyspace.

I guess by easy/hard, I am considering experience to be 99% of that. If you transform XML by using XSLT then that's great, but most devs won't know what to make of that, because they lack the experience. I haven't used XSLT since, I think, 2001, and if I had to use it today I'd need to relearn it from scratch. That's not hard hard, it's just time consuming, and I'd rather write perhaps slightly more verbose or less pretty code in a language that me+team already use than (re)learn a DSL in that case.


> I don't know if your various flatMap / etc methods are purely implemented in the client (it would be quite bad from a performance perspective?

Haven't quite groked how it works myself; the readme does call out that if your data is separated from your application by a high latency network then performance likely won't be good enough.

I'm interested in taking a look at this as I maintain apps that use local kv stores so this could complement the approach I'm already using.


Thank you for your explanation!


Personally, I found https://microstream.one/ interesting for simple persistance needs (haven't tried it, though).


I switched from jOOQ to JDBI, I find JDBIs interfaces more ergonomic than jOOQs, which was essentially just a nice SQL query builder pattern (when I last used it at least.)


jOOQ is pretty great, and recently got good Kotlin support. My only complaint is if you use their DSL you lose the SQL syntax highlighting JetBrains IDEs have.


In kotlin (maybe also java?) you can use the jetbrains @Language annotation to get syntax highlighting on string variables. A little more verbose but maybe it works for you.


jOOQ doesn't work with string variables (I mean it can, but that's not the point of that library). It compiles your database schema into objects and then you invoke a very SQL-like API over them. The code then reads very close to the actual SQL and since the schema is encoded in the Java objects you get the usual code completion for things like columns, tables, stored procedures, indices..

What you don't get is the more advanced, entire-query-level hints, because IDE's can't tell that Java / Kotlin / ... code is actually the SQL that will be emitted if you squint just a little bit :)


Someone else mentioned jOOQ, but personally I also rather enjoyed JDBI3: https://jdbi.org/#_introduction_to_jdbi_3

It addresses the issues with using JDBC directly (not nice ergonomics), while still letting you work with SQL directly without too many abstractions in the middle. In combination with Dropwizard, it was pretty pleasant: https://www.dropwizard.io/en/stable/manual/jdbi3.html

Other than that, I actually liked using myBatis with XML mappers: https://mybatis.org/mybatis-3/sqlmap-xml.html and their dynamic functionality: https://mybatis.org/mybatis-3/dynamic-sql.html

It might sound a bit of crazy on the surface, but their DSL actually made sense and was intertwined with the SQL you wrote, a bit like templating that you might use for front end stuff, except that directly for your database queries. It was great for adding complex WHERE parts for specific filters or re-using parts of queries.

Either way, it's nice to have various different options and to also have some newcomers.


JDBI3 looks nice. I previously had good experiences with Dalesbred (https://dalesbred.org), which sounds similar to JDBI3's fluent API -- a fairly minimal layer over JDBC with better ergonomics.


I've used JDBI in the past. It seems pretty solid, light weight, and eliminated much of the repetition/ceremony involved with plain old JDBC. I generally can't stand Java ORMs (Hibernate, anyone?)


While this may work for greenfield applications, I don't see this working well for preexisting schemas. From their getting started page: "Database fields are automatically created for any abstract getter methods", which definitely scares me away since they seem to be relying on automatic field type conversions.

I prefer to manage my schemas when I can and do type and DAO conversions via mapper classes in the very simple and elegant JDBI framework where you write SQL annotations above your DAO methods https://jdbi.org/#_declarative_api

JDBI does wonders for wonky old schemas you've inherited, since joins etc work out of the box (just throw them in your annotations!) The annotations can also link to .SQL files for the big hairy queries.

All these "do magic" frameworks (hibernate being one of the first) work in the simple cases but then fall apart whenever you need to do anything complex/not-prescribed. I end up having to dig into the internals of the framework to see what's going wrong which negates their whole value add.


It's worth calling out that Permazen isn't for accessing SQL databases. It's a completely different thing. So all software that uses it is by definition greenfield.


From the first bullet on permazen's "what is it?" section: "A Java persistence layer for SQL, key-value, or in-memory databases"

So it clearly is designed for accessing SQL databases


It can use a SQL DB as a key value store, but you don't get any of the advantages of using an RDBMS when you do that. The tables are literally two columns: key+value. It's more a mode for if that's all you have available for whatever reason.


Note: JDBI is an ORM where you provide the M directly (or use its default bean mapper class for simple schemas)

NOT to be confused with JDBC or ODBC which are entirely different beasts


How does this compare to a traditional ORM? I looked through the slides and the paper and this seems to largely provide the same kind of functionality as a regular ORM but with the caveat that you can only do key/value transactions.


Transactions apply across full object graphs. The fact that it uses a KV store under the hood is exposed to you, but it doesn't impose limits.

For example if you read an object, read some fields, follow a reference to another object, loop over a list you find there, then make some changes and commit, the underlying transaction will only apply if those operations are conflict free.


Think of it like the localStore in Deno. I guess at this point all the lang are going to jump on the band wagon. Why not, all lang should have native KV store.

However if we are to jump on in JDK world, persistence is not hard. Java have supurb db like H2 with full postgres compliance, that can be embedded in memory.


> However if we are to jump on in JDK world, persistence is not hard. Java have supurb db like H2 with full postgres compliance, that can be embedded in memory.

H2 is a cute database, but it doesn't even come close to having "full postgres compliance". Besides, it solves a completely different problem from Permazen or JPA - it's a storage engine, not a storage abstraction layer.


Recent and related: https://news.ycombinator.com/item?id=37553957

Also Permazen: Language-Natural Persistence Layer for Java - https://news.ycombinator.com/item?id=21646037 - Nov 2019 (5 comments)

The paper: https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze...


So, it's an object database, like Zope's ZODB on Python?

I like the idea, but I'd like to learn about use cases for it.

Otherwise, in Java, MapDB is about as far as I'd be willing to go: https://github.com/jankotek/mapdb/


I have come to the place where I feel like the word "persistence" when applied to data / knowledge / information in this way is almost offensively reductionist and actually ignorant of the background in the field...

... Just going to put this here, since it's clear the world still hasn't read and absorbed this knowledge, 50 years later.

I wish people would stop retreading the same broken paths over and over again.

https://twobithistory.org/2017/12/29/codd-relational-model.h...

https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf

"Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation) ... A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced."


Based on my reading, Permazen does insulate the user from the data organisation.

https://github.com/permazen/permazen/wiki/FAQ#architecture

It provides a Java view of the data, and handles translating operations in that later into operations on the underlying stores.


"The Java model layer is a Java-centric, type safe, object-oriented persistence layer for Java applications"

That's exactly the kind of thing Codd was arguing against 50 years ago.

Information does not fit in Java objects. It doesn't want to be stuck (and does not work well) in hierarchical property & inheritance relationships. That's not how knowledge works.

Information (and the human mind) does tend towards working better when formulated in terms of first order / predicate logic. Which is what the relational data model is based upon. SQL is a rather crude approximation of that.


True, but we're talking about persisting Java objects, not relational data or an approximation of human knowledge or whatever. This is a narrower problem space, and one solution is treating the storage layer as a KV store, ignoring the problem of translating queries to SQL or mapping classes to DDL. This library is deliberately throwing away the relational model and its power, and the README acknowledges such.


The answer is to stop persisting Java objects, and start writing applications with relations (facts / propositions) in mind.

The OO stuff I think has gotten in the way of our jobs. When I worked in Java it was a mountain of pointless data/transfer object transformations. Microservices has made this worse. It's a really bad paradigm.


"We want your code to look like Java, not learn a new query language" combined with "we will support any persistence layer under the sun" is going to be a recipe for disaster.


A bunch of arguments they list on their Github page are strawmans.

For example they say nobody wants to learn a new query language. I worked with Hibernate's QL as well as Ebean's QL and have to say none require any special training. Sure, the syntax differs somewhat but you can achieve what you want fairly quickly by looking at examples. And they all look like simplified SQL anyway.

Also the page says one has to invent DAO layer but again - neither Hibernate nor alternatives require this. On the contrary, for example if you look at the canonical Playframework/Ebean examples they suggest static finders inside POJO classes, which work just fine.


Beyond my reply to gregopet, here's a few other useful features Permazen has (and although the library is for Java, you could do something similar in any language).

- Schema migrations can be done online, "just in time" on a per row (object) basis. This avoids the common problem you can hit with some RDBMS where you need downtime to change table schemas, or you end up very constrained in what changes you can make. Permazen has some powerful schema evolution features that allow objects and graphs to be migrated transactionally and in arbitrary ways, including full blown type changes, but with sensible defaults for the most common kinds of changes (e.g. adding an enum value).

- You get access to very highly optimized and scalable backends that you might not get if you're limited to an RDBMS, for example, you can use RocksDB or FoundationDB. It can be an advantage sometimes, for example, RocksDB doesn't need the same attention paid to "vacuuming" or "xid wraparound" as Postgres does: all maintenance is done automatically and online. It can also do things like store data on different tiers of hardware (cold data on hard disks, hot data on SSDs).

- Because you can trivially instantiate a TreeMap as an in-memory KV store and then serialize it to an efficient encoding, you can use it as a file format or network protocol.

- Beyond obvious things like indexes and secondary indexes, it abstracts the concept of indexing to arbitrary derived data. You can easily define "indexes" using any code you like, containing any data you like, and the library ensures the derived data will be kept up to date even if it needs to be updated in reaction to changes "far away" in the object graph. This is a bit like triggers, except it all runs client side and you're not coding it all up using SQL.

Then it has all the usual ORM features you'd expect but done better, like transactional validation of constraints.

So you can think of this as a really, really advanced object persistence/serialization library, or if you like a software transactional memory, that has enough features to be usable as a database replacement when paired with a good backend.

Last but not least, Archie (the guy who wrote it) is a really thorough guy. He's spent a lot of time considering and eliminating many of the edge cases that surface with ORMs. The API is fully documented and clean. I've used this library for small projects and it was always delightfully unsurprising.

The big weakness is there's no real equivalent of the RDBMS network protocols, so your code needs to run near the data. We've talked occasionally about using sandboxing and code motion to move queries over slow links like the global internet, but it's never been implemented.


The HN link title says "Language-natural persistence to KV stores" - but the page itself only mentions Java, and the linked "API docs" is just a Javadoc: http://permazen.github.io/permazen/site/apidocs/index.html?i...


The concept can be implemented in any language, this implementation just happens to use Java.

More subtly, the design is split into three layers. At the bottom is an abstraction over pluggable KV stores. The middle layer is the "database" layer, which is language neutral. None of the concepts in this API are Java specific.

http://permazen.github.io/permazen/site/apidocs/io/permazen/...

Then the final layer sits on top of the middle database layer, and is the language binding to Java. It uses specific Java features like annotations and annotation processors to generate object bindings, adds things like subtyping, support for enums, etc.

In theory you could take a language like Python or JavaScript and connect it directly to the middle layer using e.g. GraalVM (or reimplement it). You'd want to work out what the equivalent language bindings would be for those languages.


It also appears to be about music production, or concerts, or something.

Maybe this is an AI-generated website.


No, it's a real GitHub project: https://github.com/permazen - they just used a random website template.


Ok, let's change to https://github.com/permazen/permazen instead of https://permazen.io/ above.

Usually we go the other way and prefer the project page, but there's clearly not enough info there.


I remember a long time ago there was a fashion for "prevalence layers" instead of persistence layers. The idea being you persist your event log, but your app is the materialisation of all those events replayed. That way you get stop-the-world persistence but you're only ever dealing with your domain model (or the event sourced version of it, at least). Obviously you then run into issues around the size of the working set, and querying use cases. The approach in the article seems like an attempt to have your cake and eat it, and so my gut reaction is that it won't work, but then I never really had an issue with ORMs in the first place - Hibernate was great and I don't think I've seen an approach in 15+ years that convinces me we've moved forwards.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: