ActorDB: A distributed SQL database with the scalability of a KV store

angrybits · on Feb 3, 2014

So yet another database (for certain definitions of database) built for the tiny fraction of applications that will:

a) warrant this kind of complexity b) be able to give up ad-hoc queries

So I guess if I ever need to build a highly scalable, fault tolerant, distributed todo list.. I'll look into it. Not sure how I will aggregate data to run the dashboards that my MBAs will be up my butt about though. Guess we can cover that detail later?

biokoda · on Feb 3, 2014

a) ActorDB is not actually all that complex.

b) ad-hoc queries tend to be the first thing out the window once you hit a certain size (query load or dataset size). ActorDB arguably gives you the largest amount of query flexibility out of the new breed of databases, while still being strongly consistent and persistent (not memory only).

angrybits · on Feb 4, 2014

The ways in which it can (and will) fail are monumentally complex. Everything looks simple when it's working.

bra-ket · on Feb 3, 2014

How do you run a JOIN across a bunch of actors

biokoda · on Feb 3, 2014

You can run queries across actors (even acid transactions), but not actual joins. This is the tradeoff to make it distributed.

AlexEatsKittens · on Feb 3, 2014

It's a relational database without joins? Can you join within an actor?

ddorian43 · on Feb 3, 2014

1 actor is 1 sqlite db so yes

adrianpike · on Feb 3, 2014

You don't.

lmm · on Feb 3, 2014

So can I create one actor for each of my clients, and add/remove machines to my cluster and have them automatically balance across it? And I can do normal SQL queries including joins as long as I never need to join across several clients?

If so, this might be the first one of these new databases that actually helps with my use case. (Though I'll wait to see what jepsen makes of it first).

jedberg · on Feb 4, 2014

I don't intend this to put down the creators of this product, but Postgres is a SQL database with the scalability of KV store if used in a way that makes it scalable like a KV store. :)

It's a great idea, but I think the "mission statement" needs to be restated.

sbt · on Feb 4, 2014

I also don't mean to put down the creators of this product, it probably has its use cases. But every time I see something like "SQL with the scalability of KV" (which happens a lot these days), it's like someone promising a perpetual energy source. In other words, there is usually some kind of catch involved.

In this case it's that JOINs are still expensive. Well, this is actually a pretty big deal, because JOINs are typically the reason I settle for either a NoSQL or a SQL store.

Again, I'm sure this has other merits that may distinguish itself from the rest of everyone and their grandma building a database these days.

biokoda · on Feb 4, 2014

Only JOINs across actors are sacrificed. Most problem sets fit quite well into the actor model and you can keep JOINs where it matters most (inside an actor).

biokoda · on Feb 4, 2014

Yes but then you're stuck with a KV data and query model. It still being an SQL database is kind of besides the point since you're not using it as a relational database.

nemothekid · on Feb 3, 2014

I've been meaning to try ActorDB out for a while now to see how well it performs on large data sets. IIRC, the basic premise is that every actor is a sqlite instance which causes some interesting limitations that might even have more interesting workarounds.

_hyn3 · on Feb 3, 2014

Is it in fact sqlite? Is it erlang? What kind of network is required between nodes? Can I encrypt it via e.g. autossh on a particular port on cloud instances outside of an Amazon-style VPC?

biokoda · on Feb 3, 2014

encryption of communication is high on the feature priority list. It is erlang and sqlite. Why sqlite and more on how it works is explained in the documentation. Network should be within the same datacenter.

rpedela · on Feb 3, 2014

Another question. How is this different or better than having a set of relational databases with a load balancer between them?

biokoda · on Feb 3, 2014

Because it's a truly distributed SQL database. I presume you mean having a master-master setup? ActorDB does not limit you to the capabilities of a single machine. Master-master still means your dataset and query load needs to fit into 1 or 2 servers.

rpedela · on Feb 3, 2014

"Decoupled data. You can't run arbitrary queries over your data. But you can still have a full relational database within actors. So you need to organize and split your data model into actors. Read the query model and examples sections to understand how."

When I saw that in the README, it said to me that I can have a distributed SQL database as long as I did not need JOIN or REFERENCES which are commonly used features of a relational database. Am I incorrect?

biokoda · on Feb 3, 2014

You can have JOINs, but within individual actors. So for instance if this entire conversation tree was a single actor, you can have a full relational model to make queries with. But you can't do joins on comments over multiple threads.

This is the sacrifice ActorDB makes to be distributed.

rpedela · on Feb 3, 2014

Yeah so I see two use cases for this:

1. A NoSQL database but with SQL as the API and some support for relations. This is quite nice because I have often found NoSQL APIs to be lacking in many aspects.

2. A set of fully relational databases that need to be managed and/or load balanced. This is a use case I have. If the system administration is significantly easier than the tools natively provided by Postgres, MySQL, etc, then I can see this being a valid use case. The use of SQLite does not concern me since it has legitimate downsides [1][2].

[1] http://www.sqlite.org/omitted.html

[2] http://www.sqlite.org/whentouse.html

andypants · on Feb 4, 2014

I'm a bit confused about the concept of actors. Should I create an actor type for each table, and an actor for each row? They seem to be equivalent in some of the examples.

biokoda · on Feb 4, 2014

It very much depends on your use case. But no you do not split by tables or rows generally. You split your problem into actor types.

For instance if you were to create your own news.ycombinator.com you would require 3 types of actors: frontpage, thread, user

frontpage:

- CREATE TABLE threads (storyid INTEGER PRIMARY KEY, author INTEGER, title TEXT, url TEXT)

thread:

- CREATE TABLE comments (id INTEGER AUTO INCREMENT, parent INTEGER, user INTEGER, txt TEXT)

user:

- CREATE TABLE info (id INTEGER PRIMARY KEY, email TEXT, password TEXT)

- CREATE TABLE posts (id INTEGER PRIMARY KEY, threadid INTEGER)

There would be many thread and user actors, but only 1 frontpage actor.

Xdes · on Feb 3, 2014

    >Use case: reliable distributed counters

I am not understanding the example provided for this use case. In my implementation of this use case I use Redis back by an RDBMS. Redis provides a cache and a distributed counter while the RDBMS engine handles all the OLTP and OLAP.

I currently have a job that takes snapshots of the real time Redis counter at given intervals and inserts them into my analytics table.

How would the ActorDB way simplify or improve this model?

biokoda · on Feb 3, 2014

Author here. With actordb you get:

- no single point of failure (redis and likely the rdbms as well)

- no need to maintain two very different systems

- scalable. As in plug in a new cluster and it will proportionally increase capacity.

We are working on more detailed use case examples. As well as a pretty big bugfix release due out tomorrow or the day after.

Xdes · on Feb 3, 2014

    >no single point of failure (redis and likely the rdbms as well)

Your README mentions replication in the Operational characteristics, but it does not cover partition tolerance?

- Is there one master per cluster?

- What happens when the network becomes segmented?

It is also unclear of how this affects multiple clusters.

- How is a global consensus of the data reached?

- What happens when the network becomes segmented between clusters?

    >no need to maintain two very different systems

SQLite isn't a full featured RDBMS. It mainly lacks support for stored procedures and concurrency. It is a very nice solution for small, self contained, horizontally scalable usage scenarios. However, once the use case involves generating custom globally unique identifiers (like a 7 character alphanumeric string) it generally falls apart.

    >scalable. As in plug in a new cluster and it will proportionally increase capacity.

Redis and RDBMS have well known use cases for replication and partition management. Redis itself is single threaded, so it is horizontally scalable by your budget (and likewise supports sharding and replication albeit the network segmentation can lead to issues). More full featured RDBMS like Postgres, MariaDB, Oracle, and SQL Server support concurrency and granular locking with similar scaling strategies.

I do not see the value proposition in consolidating the infrastructure in this case.

biokoda · on Feb 3, 2014

There is one master per actor. An actor lives within a cluster. As long as a majority of configured servers are accessible, writes will succeed.

All queries (reads and writes) go through the master and are not committed if master does not reach a majority of configured servers in the cluster.

Individual actors are not meant to be scalable. They are meant to be fast enough. For instance if you were to create your own news.ycombinator.com an actor would be this entire conversation tree. Sqlite would be sufficient. You don't need concurrency on a per thread level.

ActorDB provides a global uniqueid generation feature (independent of sqlite).

m_mueller · on Feb 3, 2014

Very interesting, I like the idea of a distributed sqlite. Does it support long term offline replicas the way couchdb does? If yes, do you have mobile support on the roadmap? I think this should be the natural succession as I find the choices for offline replication on mobile devices still very limited, especially cross platform ones that fully integrate with a RDBMS landscape.

cordite · on Feb 3, 2014

Although wire-compatible, it does not look like it can be a drop-in replacement, since the query model needs you to explicitly define actors.

http://www.actordb.com/docs-querymodel.html

biokoda · on Feb 3, 2014

Yes that is correct. Not a drop-in replacement. You can't go from a traditional RDBMS and magically turn it into a distributed database. Some constraints need to be loosened. ActorDB tries to loose as little as possible to turn it into a distributed database and still have a rich query model (without being memory only or just key/value).

stephenhuey · on Feb 3, 2014

Powerful red on your website! Just wanted to mention the white text looks great in Safari, but really faint in Chrome.

michaelmior · on Feb 3, 2014

Same here. I can barely read the text in Chrome.

_hyn3 · on Feb 3, 2014

Only verdana is specified as a font-family for most of the white text on red. This is probably sub-optimal, consider Verdana, Helvetica, sans-serif;. (Also, font-weight: bold would help on the smaller text.)

rpedela · on Feb 3, 2014

What was the specific reason for using sqlite? Why not Postgres or MariaDB or whatever?

biokoda · on Feb 3, 2014

Because sqlite is an embeddable isolated database. Each individual actor is completely independent. Sqlite naturally fits into this use case.

Using postgres or mariadb would be very problematic or even impossible, because they are not designed for this kind of use. ActorDB needs to be able to move actors to new clusters when they are added, while still execute queries on them at the same time.

asdasf · on Feb 3, 2014

Really a shame they used the mysql protocol. The mysql command line client is terrible compared to psql, and the mysql lib is much less pleasant to use than libpq.

_hyn3 · on Feb 3, 2014

MySQL is ubiquitous, pgsql less so. Most cross-db tools start with MySQL compatibility first. Regardless of whether a given tool is better for some use case, targeting compatibility with the market leader makes all kinds of sense.

AaronBBrown · on Feb 3, 2014

Can you give an example of where mysql (the client) is terrible compared to psql (the client)? I've had the opposite experience.

cdavid · on Feb 3, 2014

for once, you get proper reverse search, which is not available in mysql (last time I checked, which has now been a while TBH).

AaronBBrown · on Feb 3, 2014

Works for me. I use this feature nearly all day.

http://showterm.io/03de0c8a4992dc7008ef2

meritt · on Feb 3, 2014

mysql cli has always had reverse search (ctrl+r) because it was built on readline. They actually recently disabled it, indirectly, because they switched to editline. So now you gotta configure it via ~/.editrc

http://bugs.mysql.com/bug.php?id=60465

asdasf · on Feb 3, 2014

Useful tab completion, more useful commands, catching Ctrl+C like a shell is supposed to do instead of exiting like an asshole, etc. Can you give an example of your opposite experience? I've literally never met someone who has used both and not preferred psql before.