Hacker News new | past | comments | ask | show | jobs | submit login
EventStore: Open-Source, Functional Database with Complex Event Processing in JS (github.com/eventstore)
122 points by juancampa on Nov 4, 2018 | hide | past | favorite | 63 comments



I've played around recently with a Redux inspired frontend architecture where the complex, nested reducers are replaced with a single one that simply adds every action to an immutable linked list (i.e, `(state, action) => ({state, action})`. This means all the business logic that would normally live in reducers can be moved over to memoized selectors, which fixes the awkward difference in how reducers and selectors are composed that tend to happen in a typical Redux app, and gets all the declarative benefits of selectors. It makes it easy to load code for new selectors asynchronously (for example when navigating to a new route), and because the state is a long list of every state your app has ever been in, they can "rewind" as far back as they need to sync up correctly with the rest of the app.

What I've realised recently is, this is basically event sourcing. Replace actions with events, selectors with projections, and memoization with snapshots. So this submission is certainly timely and of interest to me. Thanks!


I wrote a similar comment before, viewing redux as a realization of the event sourcing pattern: https://news.ycombinator.com/item?id=17061827

You can go even further and use the exact same "events" or "actions" for event sourcing on the server side too. You just need to save the list of actions to the server. This lets you do realtime sync and multiplayer really easily!


This.

It makes Implementing CQRS in a single service really easy. GETs load the event log, replay and display the results while POSTs send a command that gets translated into an event. If it is valid it gets appended to the log. This approach is storage agnostic: It doesn’t matter how the log ultimately gets stored


Yeah good connection there. The Redux chrome plugin does indeed show a log of actions and you can rewind through time.

That's another nice way of doing things that way, better diagnostics eh?


If actions are fully deterministic, a list of actions is isomorphic to a list of events: you can always determine what happened by looking at what was requested.

A list of all past states avoids the determinism problem, but the consequence of each action is only implicit in the difference between two states.

For event sourcing, "the state" is the list of events, and what you're calling "the state" is just another selector/projection.


No, I’m suggesting that your state should be a list of event/action objects. By using a linked list, every new state can have a reference back to the previous state. This is not a projection, but a way to immutably append new actions to a list. This list is the state that you’d pass to selectors.


This mirrors exactly the way I've been approaching redux recently as well following the same realization. I feel like this is much cleaner and easier to reason about.


Hi, I'm a Redux maintainer.

Got any examples of this?

Also, how do you feel it's better than a typical Redux app setup?


For me it’s been experimental so far. I’ve not actually used Redux in these experiments, but I’m inspired by where Redux and Re-reselect has been leading me for the past two years or so. The pattern is a gold mine of interesting implications that I’m still discovering, such as real time collaboration, and speculative precomputation/prefetching.

As it moves all the focus from reducers to selectors, and has quite different requirements for how to do caching there, it’s led me to work on a custom selector library with quite different API from reselect/re-reselect. I’d like to release it when my experiments start to feel... less experimental.


Wow, I'm super curious about this now. Please keep us posted


That’s nice to hear, good motivation to get it out the door. It might take some time, but I’ll be sure to do a show HN when it happens!


Please ping me as well - I'd certainly be interested.


It seems like the main difference with a reducer is that the reducer just returns `state`, not `(state, action)`. I'm reading this for the first time but that's how I'm reading it.

But y'all must be storing this `(point-in-time-state, action-that-produced-state)` somewhere, because the Redux Chrome plugin displays this data. Was I wrong in thinking the Chrome plugin was just visualizing data already stored by Redux?


No, the core Redux store does _not_ store any action history by itself. It's simply:

    function createStore(reducer) {
        var state
        var listeners = []
        
        function getState() { return state  }
        
        function dispatch(action) {
            state = reducer(state, action)
            listeners.forEach(listener => listener())
        }
    }
It's the Redux DevTools that do the real work of actually saving an action log, like what was described above. In fact, canceling actions does actually work by re-running the actions in the same sequence, minus the ones that are being skipped, to generate the new state.

That logic has been split out into its own package, which you can see in the https://github.com/zalmoxisus/redux-devtools-instrument repo.

Very vaguely related side note: earlier this year, I spent a couple days to add an "action stack trace" tab to the Redux DevTools Extension. When you click on an action, in addition to seeing the action contents, state tree, and diff, you can now also see an actual formatted stack trace that shows you exactly where the action was dispatched from (including the actual code if you've got sourcemaps enabled). Sadly, the extension maintainer has been MIA recently, so we're considering forking it into the Redux org. Until then, you can download my custom build of the extension here: https://github.com/zalmoxisus/redux-devtools-extension/issue...


That's really interesting. I honestly haven't given Redux DevTools much thought despite the fact I use it daily as a core part of my workflow. I thought, like I said before, it was mostly just visualizing artifacts produced normally by Redux. Makes the browser extension much more interesting!

edit: Here's the link where it's implemented in redux-devtools if anyone else is curious https://github.com/reduxjs/redux-devtools/blob/master/src/cr...



This was also acknowledged by Greg Young a long time ago :) [1]

[1] maybe this talk? https://www.youtube.com/watch?v=JHGkaShoyNs


We use this heavily in production and to good effect. It has its downsides (doesn't every product) but we have found it works well once you've grasped them. If you have a primarily .NET focused team, have plenty of event-driven applications and want to do CQRS / Event Sourcing for some of your applications I would recommend it as a starter step towards that. It gives you a lot 'out of the box' which is great when starting out but over-reliance on some of its features can make it hard to replace. Make sure you have a good handle on retention and an archival strategy. Storage maybe financially cheap but it's certainly not maintenance friendly once your database grows beyond certain sizes (we're currently running our own fork to solve some of these issues whilst the PR process is going on).


We just wrote our own on top of Sql Server. Runs fine for thousands of accounts with many events per day. We currently host it on the smallest azure sql scale level. We have virtually unlimited scaling with azure, and should have no problem sharding it in future because we include extra metadata (event type, accountid, etc.) so we could choose to shard on any of those dimensions.


I do event sourcing in my company and we had a look at this when we started out 4 years ago. What I don't understand is why build a database? Why not build something in the application layer that uses your fav database as a storage mechanism instead? Aren't the existing databases more mature and better to use?


There are a lot of non-trivial aspects to scaling event sourcing systems. Particularly if you want atomicity or asynchronous two-phase committing. Having implemented them ~5 different ways for different services at my startup, I’d definitely welcome a reliable DB-level abstraction.


Also putting that abstraction in the app layer allows developers to break it. Either by accident, or as a temporary hack that never gets fixed, or a temporary hack that has unforeseen side effects.


Sorry for my inexperience in this kind of implementation but wouldn't an sql transaction achieve something similar? And with table partitioning (at least on postgres) you should be able to go quite far, instead of neediness a new db (and a lot of related stuff to study)


No because in a distributed system, you can't rely on SQL transactions. i.e. if you need to make one API request to start the transaction and a second API request to commit it/rollback - you can't hold the SQL transaction open between those.


I've used a MySQL table `event_store` with fields `uuid`, `playhead`, `payload` and `recorded_on`; works fine.


Good to know. Was thinking of this for myself in a simple usecase.


How did you manage horizontal scaling with this approach?


If you need total system order then that will always be your bottleneck, although you can make it very fast by scoping it to just a sequence number generator and doing the actual work in separate processes.

Otherwise, most event sourcing uses different "streams" of events for different application functions, so you can shard by stream in whatever way works for you.


You could shard based on uuid, so that each shard has its set of objects that it manages.

The easiest way would be to cast uuid as a 64bit unsigned int, then mod by the number of shards. If the number of shards is dynamic, then use consistent hashing.


Event Sourcing seems like the ultimate YAGNI architecture.

You make an architecture that optimizes for future features at the expense of a coherent flow for the current features that you know you have.

it's like if you made pub-sub the core architecture for your system. very actor-system like, but also probably harder to understand the runtime behavior - control flow.


Try to solve complex architectures that can handle race conditions without some kind of event sourcing, good luck with that


It's not like you are paying that price for long. Usually you'll start getting benefits the next month, as soon as someone wants new numbers in the dashboard. Or as soon as a new feature needs to be added


With good tooling, it is arguably a lot easier to understand than a synchronous microservices architecture, because of the persistence of events.

I’ll admit it’s probably harder to debug than a monolith, but that’s true of any distributed system.


I've had to break it to my client that had built a CRUD application using CQRS and Event Sourcing that they have wasted a huge amount of resources on a misguided architecture. A rewrite is pending.


> A rewrite is pending.

So another huge amount of resources wasted.


rewrites are really the on thing all programers enjoy doing. If it works dont touch it, should have caught it sooner. Make the next project better and move on.


Why doesn’t it work for the use case?


I would gladly hear more about this scenario. Why did the initial approach not work? How is the follow-up going to improve on that? And why was a rewrite the best way to fix things? Thanks for any insight you can share.


Is it being rebuilt as pure CRUD? I've been working on an app lately where the major entities in the relational db won't be updated. They'll will just be superseded by the next record.


Some ex-colleagues were considering using this a few years ago. They were getting interesting in CQRS, got Greg Young in to give a talk, and came away filled with enthusiasm. I think they ended up doing something incremental on top of the MySQL they already had, rather than switching over to this.

Is there a particular reason this is suddenly exciting again now?


"Functional Database" is contradictory. Honestly this sounds like a bunch of buzzwords concatenated together. Is there an example or a purpose for this?


Personally, I've heard of it in the context of https://github.com/commanded/commanded, a CQRS framework for Elixir that supports EventStore as a storage backend.

I think the idea is that it's something simpler than a database. It's more like an append-only file that has on-insert triggers/durable running queries. (Sort of like https://www.pipelinedb.com/ does, or like blockchain nodes do.)

Or, you could think of it as message queue like Kafka, with permanent durability of all "messages", and a single fixed subscriber bolted onto the queue server, where that subscriber is exposed through the queue server's API, allowing users to reprogram it arbitrarily.


It isn’t that contradictory. You can have data in a functional language. One of the common features of a functional language that separates it from imperative is data immutability. In this case it’s a database of immutable data. I personally understand exactly what they’re talking about when they call it a functional database.


I don't understand how this is useful. The purpose of functional programming isn't to make everything immutable, but to use immutability to prevent unnecessary change of state which can result in flaky code. So where's the usefulness in a functional Database?

Also this still isn't functional. Is adding records to a set not mutation?



They should make this more prominent, now I get what it does

Edit: Is a backup mechanism implemented? It's for persistent data, right?


Oh I see I misunderstood then. But still, how is this a database? Without looking into it too much it seems that would either be an ORM or simply state.


Think about it as database of function calls, ordered by creation time. I'd however call it a log rather than a database - there will be a state your data achieves, but ES becomes the source of truth . DB can still used to contain the current state (however is not needed as ES tracks its own state) and data can be dropped from DB without repercussions, if you don't mind replaying events and have a separation of events that deal with setting data / performing external actions (like sendin emails for example)


A database is simply an organized collection of data with a restricted interface for accessing and/or querying said data. This seems to qualify.


Should we call PANDAS a database?


Add "durable" qualifier to my definition, since otherwise you could say that it's just a data structure. Which it is, but with durability properties.


I don't know anything about EventStore, but immutability and databases go surprisingly nicely together. Mash together versioned data and persistent data structures, and you get to store history efficiently (and can have separate operations for excising history). See eg Datomic.


git is a functional database, well almost. Datomic is what you get when you finish that idea. it does mostly all the same sorts of things postgres does, but like git, the strength is it has idealized caching at all layers and scales horizontally / serverless. it is also just better (do you know anyone who thinks git sucks and wants to go back to SVN or CVS?)


Good article on how EventStore can be used in production: https://medium.com/@eulerfx/scaling-event-sourcing-at-jet-9c...


What does functional mean here?


What's it good at? What is this the right tool for?


Persisting a log of events. Used with CQRS and event-sourcing to decouple the Read model(s) from the Command layer for benefits like: multiple read models (projections), time-travel and audit-ability (rebuild the read model to certain point in time), and decoupled contexts.

Event-sourcing can be a really useful tool in many domains, but especially where having a state-of-the-art audit log is helpful.


In the parlance of Domain-Driven Design, every aggregate (think object instance as a loose equivalence) is its own stream of events. Loading aggregates involves replaying events until you are caught up.

The key design requirement here is to deeply understand the reads your application will need to perform.

The downsides are you have to push the data somewhere else to digest and report on it. Relations are tough to model, too, as events that happen to more than one aggregate essentially have to repeat the data in each stream in the form appropriate to that domain entity. (You can often model this as one event causing another. Projections work less well for this circumstance.)

This can make it hard to draw data from the underlying store. Instead you must "hydrate" it into objects in memory to ask questions of it, though projections can help. EventStore is definitely not the right choice if you need ad hoc query capability. (I used EventStore in production for 2 years with F#.)

I'd classify this as an exotic data store. Use it if you have a really strong need for event-sourcing. Me... I'd likely choose Datomic instead.


It's kind of weird that it's linked to the repo, and not the documentation. https://eventstore.org/docs/event-sourcing-basics/index.html


It's not a new concept.


Was some claim to a new concept made somewhere? I haven't seen this.

A new realization of an existing concept can be notable.


To be fair I have been talking about event sourcing for over a decade. The concepts have been around far longer than that. In fact most databases "event source" their own internal structures.

There is however benefit to modelling a system in that way and not just an internal transaction log. I have done many talks on the subject and you should be able to pull one up on youtube etc fairly quickly (its a bit much to go through in a comment).


Few things are new concepts. Not being a new concept doesn't mean this is interesting or useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: