Listen to your to PostgreSQL database in realtime via WebSockets

JakaJancar · on April 30, 2021

This works until you have an A->B->C hierarchy of entities and want to monitor the subtree rooted at a specific A. Then the WAL for table C, even with REPLICA IDENTITY FULL, will not tell you whether a change is relevant. At that point you need to do a JOIN anyways, so might as well just use a trigger and NOTIFY instead.

Two of the other reasons for this over triggers are also misleading:

- Setting up triggers can be automated easily.

- True, you only use 1 connection to the database, but you now operate this app. You could also run pgbouncer.

skyde · on April 30, 2021

if C is a "child entity" of C then it's primary key should have the primary key of B as a prefix.

This is how I design all my table schema. and it make database partitioning easier too.

derefr · on April 30, 2021

I wish I could do this, but I already have more than four billion As, so my A IDs need 64 bits. I use composite keys for my Bs (A ID + sequence number per A), which allows me to search on the pkey index using just the A column; but by the time I get to C, that gets unwieldy. So my Cs have UUIDs, my Bs also have UUIDs (secondary unique ID, not the pkey) and my Cs have a foreign key pointing to their B. While also having an indexed A ID field.

I'd love to just just have "A-B-C" as my Cs' IDs... but it'd only work for my use-case (i.e. be performant) if it was running on a computer with 256-bit registers.

skyde · on April 30, 2021

Curious why your primary key are GUID? Also if the primary key of table C is made of 3 column (A GUID, B GUID, C GUID) the index in PostgreSQL will not be that big with compression enabled because all the common prefix will not be stored redundantly.

Also having a single compound index on table C covering column (A ForeingKey, B ForeingKey, C GUID) is much better than having multiple index on table C.

tehlike · on May 1, 2021

Having the primary key generatable at application level has performance benefits especially on inserts. You can write all related entities in one go vs waiting for db to generate the the pk, and passing that to dependent entities.

skyde · on May 6, 2021

Also the problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.

skyde · on May 6, 2021

you can have "id" Generatable at application level without using GUID. This is what twitter is doing "snowflake".

It generate tens of thousands of ids per second. Those id fit in 64 bits. And those id are sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another.

See: https://blog.twitter.com/engineering/en_us/a/2010/announcing...

Ombudsman · on April 30, 2021

GUIDs are extremely useful if you want to move and merge data around without having to worry about primary key collisions.

saurik · on April 30, 2021

I would consider this model "usually incorrect" in that it is quite likely you will eventually want to be able to move an entity to a different parent. I would thereby always give people the advice that one "should" have primary keys for every entity type that are at least global for that local entity type (if not truly global for all entity types, as there is a lot of power doing such) and then implement this child relationship as a unique foreign key constraint, unless you have some very specific functionality or performance requirement that would force primary key conflation (and FWIW, "I want to use this WAL watcher" might count, but it seems like a steep price to pay being unable to do sane reparenting).

skyde · on May 6, 2021

I agree the children entity ID must be already unique without having to use the parent entity id.

It's just that in my experience have the children table primary b-tree sorted on ParentID then childrenID make join much more efficient unless you can use table interleaving like in Google SpannerDB https://cloud.google.com/spanner/docs/schema-and-data-model#...

JakaJancar · on April 30, 2021

“Should” is pretty strong. The choice of PK has its own tradeoffs. But you are correct, if you go the prefix route, it makes the WAL approach trivial, at least while everything really is a tree and there are no many-to-many relationships.

karambahh · on April 30, 2021

For some reason I understood the title as an audio rendering of changes on your pg db

The actual purpose is actually way cooler, looks like a great tool

yokto · on April 30, 2021

This reminds me of a project I did a few years ago:

"Stockify is a live music service that follows the mood of the stock markets. It plays happy music only when the stock is going up and sad music when it’s going down." https://vimeo.com/310372406

(it's of course a parody, but I made a functional prototype)

dkersten · on April 30, 2021

I know someone who uses tones as an easy way of knowing if there's a market move on some cryptocurrencies. Basically, there's different tones for a trade that traded higher than previous and for a trade that traded lower than the previous. Possibly pitch or volume or something was used to indicate the volume of the trade, I don't quite remember. Either way, he could quickly hear if there was a major market move or something just by the amount and tone of the beeps. He used this as a way to know then he should look at the chart or whatever.

krrrh · on April 30, 2021

We spend a lot of time training digital computers to deal with analog information that has been converted into digital forms, and I wonder how much we could also gain by finding better ways to convert digital information into analog forms that our brains (as analog computers) can better parse.

recuter · on April 30, 2021

Oh you're also friends with Gilfoyle? https://www.youtube.com/watch?v=uS1KcjkWdoU

dkersten · on May 1, 2021

Hahaha I forgot about this

shrimpx · on April 30, 2021

I had a related idea, to make a running program produce an audible hum like a car engine would. And the hum would vary depending on what paths of the program are executing, so you get an idea what/how your code is "doing" by listening in.

grumblestumble · on April 30, 2021

This is what NPR does on the market segment (Kai Rysdal?) The background music reflects the changes in the market over the day.

Also, if you haven't read Douglas Adam's "Dirk Gently's Holistic Detective Agency", there's a good subplot around this exact idea.

jmull · on April 30, 2021

That's what I thought too.

I guess someone needs to do it now, hm...

(But: I think live db-activity rendered to music as a kind of monitoring mechanism is potentially way cooler than just getting db updates over a web socket.)

rzzzt · on April 30, 2021

You can listen to GitHub commits: https://github.audio/

Ombudsman · on April 30, 2021

This is usually how I fall asleep.

arthurcolle · on May 1, 2021

This is how you guarantee your child will become an ace coder. The fractal audio will transform key centers in the developing brain, and give them subliminal knowledge of all major repos of note

m_mueller · on April 30, 2021

Same here. App idea geared at DevOps: ambient sound scape generator based on monitoring events, e.g. from Datadog.

jrockway · on April 30, 2021

We have this in New York City to monitor the flow of traffic on quiet residential streets. If the flow rate is OK, it's nice and quiet. If the flow rate gets too low (or a traffic light turns red) then a chorus of very angry car horns erupts, and you are jolted out of your train of thought, wishing you had the guts / tolerance for prison time to run outside and smash every single one of the cars with a crowbar.

I think a recording of this is actually built into Pagerduty to use as a sound when you're getting paged. I went with the "golf ball hit into a flock of geese" one, though. Every time that goes off my first thought is "OH GOD I'M DYING HELP" but then I look and it's just GCP down again. Ironically therapeutic.

tetha · on April 30, 2021

There is an implementation and a paper called peep, the network auralizer. I've been thinking of building something like this in rust or go as a learning project, but audio streaming and mixing is harder than I thought. If someone has good libraries or tutorials there, I'd be grateful.

http://peep.sourceforge.net/intro.html https://www.usenix.org/legacy/publications/library/proceedin...

jensneuse · on April 30, 2021

Techno livestream. BPM increases when your website gets high traffic.

mark-wagner · on April 30, 2021

Similar: Solaris's snoop (tcpdump equivalent) with the -a option:

Listen to packets on /dev/audio (warning: can be noisy).

https://docs.oracle.com/cd/E23823_01/html/816-5166/snoop-1m....

krrrh · on April 30, 2021

You’ve got to throw in some uncompressed pcm audio streams just to keep it interesting.

When we were learning Linux a friend and I used to pipe /dev/hda into /dev/dsp for fun and when we hit some fragments of uncompressed piano recordings we joked that we must have hit the kernel source code. Good times.

domoritz · on April 30, 2021

Same here and I am slightly disappointed it's not what I thought it was.

SamBam · on April 30, 2021

Same. Figured pop-pop-poppop-pop could be the background noise when we go back to the office...

brap · on April 30, 2021

I really want Supabase to take off. For me, the #1 reason not to go with Firebase was lack of relational DB support, and #2 vendor lock-in.

pier25 · on April 30, 2021

For #1 you could check out Fauna. Much more powerful than Firebase regarding querying and data modeling.

With Fauna you still get vendor lock-in though.

yawnxyz · on April 30, 2021

i've moved from Fauna to Supabase... their FQL language was kind of confusing and poorly documented compared to Supabase

jadbox · on April 30, 2021

If Fauna went open-source, I'd make the case to pay for their hosting. There's business value in just having the option to host it internally or make modifications.

pier25 · on May 1, 2021

I agree. I love Fauna, and I'd definitely use their hosting even if they offered a self hosting option, but the closed source and vendor lock-in are really strong points.

kabes · on April 30, 2021

For 2 you could look at meteor, which does a similar thing with mongo. It's not relational, but with supabasr you also don't really get the advantages of a relational db, since the update events are about single tables.

endisneigh · on April 30, 2021

I'm really curious to see how this ends up doing vs. Hasura.

Postgrest (which is basically what Supabase is, obviously with other value-adds) or Hasura which basically exposes a GraphQL server that interfaces with Postgres.

Personally I prefer GraphQL as there's more tooling around that compared to Postgrest but it's interesting to see. In this case if supabase was GraphQL you could just use a subscription.

I'd be curious to know why supabase didn't go with GraphQL.

michael_j_ward · on April 30, 2021

I believe the graphql equivalent of postgrest is postgraphile [0].

[0] https://www.graphile.org/postgraphile/

kiwicopple · on May 1, 2021

(supabase cofounder)

We're not opposed to GraphQL at all. We use PostgREST because it's very low-level. It's a thin wrapper around Postgres. It uses Row Level Security, works nicely with Stored Procedures, and postgres extensions (like PostGIS).

Philosophically we want to focus more on the database-centric solutions, rather than than middleware-centric solutions. This means a slower time-to-market, but it's beneficial because you have a "source" right at the bottom of your stack, and you can connect whatever tools you want to it (including Hasura!)

pier25 · on April 30, 2021

> I'd be curious to know why supabase didn't go with GraphQL.

I'm guessing because GraphQL is very limited compared to SQL.

From reading their docs [1] it seems they have a JS API for SQL with support for nested data a la GraphQL. Not sure if its their own or comes from some other library.

[1] https://supabase.io/docs/reference/javascript/select

rattray · on April 30, 2021

With postgraphile you get the full power of postgresql exposed through graphql.

Using a few commonly-used add-ons, you can write a whole lot of sql in gql and it Just Works, translating your deeply-nested and richly-filtered gql query into a single, performant sql query.

You can see roughly how this translation ends up here: https://gist.github.com/rattrayalex/ae39c2cf0356f1257ece4f3c... (in production it's more condensed etc).

You can also extend with sql functions or add/wrap resolvers at the js level. (And yes, you can easily hide columns, rename, etc)

efxhoy · on May 1, 2021

I recently spent a week doodling around with postgraphile and postgrest. I really liked both, writing everything in SQL is super great. It does feel a bit weird at first but the amount of boilerplate it removes is fantastic and it really forced me to think about the correct schema for my business logic.

My only gripe with postgraphile is not supporting column grants in RLS policy.[0] Instead they recommend splitting tables in two and having one-to-one foreign keys and using grants on the full tables instead. It's a shame because PostgREST dealt with column grants just fine and I don't want to use postgraphile's specific "smart comments" in my database, they just seem like a really un-elegant solution to what is otherwise a super nice pattern: SQL everything.

I also considered Hasura but they have their own auth system and don't use RLS, which is a shame. Having access control in the database is super great when you have multiple APIs and different people with psql and different roles, all with different levels of access to the same DB.

0 : > Don't use column-based SELECT grants: column-based grants work well for INSERT and UPDATE (especially when combined with --no-ignore-rbac!), but they don't make sense for DELETE and they cause issues when used with SELECT. https://www.graphile.org/postgraphile/requirements/

pier25 · on May 1, 2021

Thanks, super interesting stuff!

iooi · on April 30, 2021

For Python folks, this library builds a GraphQL schema from your SQLAlchemy models: https://github.com/gzzo/graphql-sqlalchemy

slashdev · on April 30, 2021

So one big difference is this tails the bin log (as a replication slave), while Hasura polls. Hasura makes that polling efficient by multiplexing the polling queries transparently so they send one query to poll N queries on the same table with the same shape.

But once you're polling a large percentage of your whole data set, the bin log approach has a clear advantage.

tablatom · on April 30, 2021

Nhost is a very similar offering, built on Hasura

https://nhost.io/

thom · on April 30, 2021

Have you had positive experiences using Hasura in production?

elitan · on April 30, 2021

Hasura have some big clients:

https://twitter.com/commandodev/status/1387348391047335938

skyde · on April 30, 2021

I love it! Only issue is if you can't consume the history of changes. But this is good for caching materialized view in the client

1- subscribe to event and buffer event

2- run SQL query for your materialized view

3- apply all buffered event and new event to incrementally update the materialized view

this way the (slow/expensive) query of the materialized view don't need to be run periodically and your cache always is always fresh without need to set TTL.

If you websocket connection get disconnected, drop the materialized view and repeat step #1.

raarts · on April 30, 2021

So basically Apollo GraphQL server could use the same mechanism for enabling GraphQL subscriptions right? Any idea what they currently use?

speedgoose · on April 30, 2021

They use websockets and a set of plug-ins to support redis, rabbitmq, or many other things. However, Apollo is planning to remove subscriptions for the next major release.

raarts · on April 30, 2021

Apollo removing subscriptions? I don't understand, aren't those a core feature of GraphQL?

speedgoose · on May 1, 2021

> As far as subscriptions itself goes, Apollo Server 2 has a superficial integration with subscriptions-transport-ws that doesn't tie in to the plugin system, federation, Studio reporting, etc. And subscriptions-transport-ws isn't even the best maintained package that solves this problem. So 3.0 will remove that integration. I do hope that we do the work to add subscription support to the Apollo server-side platform as a whole sometime soon, but I can't promise a timeline for that

https://github.com/apollographql/apollo-server/issues/2360#i...

ex3ndr · on April 30, 2021

Where did you get plans to remove subscriptions?

speedgoose · on May 1, 2021

In their roadmap here : https://github.com/apollographql/apollo-server/issues/2360#i...

ex3ndr · on May 2, 2021

That's completely different answer: They remove old graphql-subscirption-ws package that is not maintained anymore and split everything to separate packages to make them work faster.

No one is going to remove subscriptions.

arthurcolle · on May 1, 2021

Why is that?

speedgoose · on May 1, 2021

> As far as subscriptions itself goes, Apollo Server 2 has a superficial integration with subscriptions-transport-ws that doesn't tie in to the plugin system, federation, Studio reporting, etc. And subscriptions-transport-ws isn't even the best maintained package that solves this problem. So 3.0 will remove that integration. I do hope that we do the work to add subscription support to the Apollo server-side platform as a whole sometime soon, but I can't promise a timeline for that

https://github.com/apollographql/apollo-server/issues/2360#i...

arthurcolle · on May 6, 2021

explanation makes sense, I didn't get the right interpretation from the initial comment - I guess there is just technical debt they are dealing with.

They aren't trying to delete the feature itself, just roll it into their abstractions

endisneigh · on April 30, 2021

GraphQL severs handle subscriptions websockets as well usually.

bcoates · on April 30, 2021

Neat! Is the idea to subscribe to the websocket directly from frontend code or from an application server?

It sounds like it's the latter, which seems like an unusual use case for websockets. (or at least I can't think of anything else that does ws for a server-to-server API)

cookguyruffles · on April 30, 2021

As far as getting cheap framing and bidirectional communication over an HTTP port goes, there's nothing to stop you using websocket server->server. For example users, many crypto exchanges publish their feed this way

codeflo · on April 30, 2021

But why, except to claim “web-scale”? Especially for publishing events, isn’t something like MQTT a lot saner?

cookguyruffles · on April 30, 2021

I already listed two, because you need

- cheap framing

- bidirectional comms

- are already exposing an HTTP port

- can make no assumptions about the client's networking than port 443 available, even if only via a HTTPS proxy

- can make no assumptions about the client's software except they can speak an extremely common protocol

- want security to work the same way it does for the rest of your services

"web-scale" doesn't come into it. Websockets are inherently difficult to scale because they're stateful, but in return you get the lowest latency in both directions the underlying network can provide with extremely reasonable (2-4 bytes) overhead compared to raw TCP or SSL

megous · on April 30, 2021

No broker.

throwdbaaway · on April 30, 2021

For me, this looks like the perfect implementation of a very scalable config service, where the read RPS can get very high, but the config values are rarely changed. The consumers of the config service can be either frontend applications or backend application, thus websocket is a great choice.

jordan801 · on April 30, 2021

Yeah! Front end apps can skip the server and get data from the db directly. There's OAuth2 setup and you can authorize users to have crud access to tables, or even a granular as rows.

kitd · on April 30, 2021

Nice!

Can you go in the other direction, ie push via WS into a table? Obviously you'd need the correct auth.

I had an idea of doing something similar with MQTT. Data at rest and data in motion in 1 (or maybe 1.5) components.

jordan801 · on April 30, 2021

Sure can! I contributed on this project a few months ago. Hoping to use it soon on a project.

I implemented a portion of crud with a vuejs app. It's a pretty neat setup. Just authorize with OAuth2 and then if you have permissions youre good to go!

edoceo · on April 30, 2021

yea, thats pretty easy. a small Go/Rust/whatever program to do WS and write (well formed, authenticated) message data via INSERT

tyre · on April 30, 2021

Do you know if there is something similar for SQLite? I've been looking for a while and wonder if maybe listening to the WAL is the best option. There are hooks to tie into but the database I'm looking at isn't mine (it's local on my machine for another application) so idk if I should be futzing around with it.

WJW · on April 30, 2021

https://litestream.io/ ?

benbjohnson · on April 30, 2021

Litestream author here. Litestream monitors and replicates based off the WAL and there are some tricks it has to do to make that work. However, it doesn't currently decode those WAL pages and determine the changed data.

That is something that's on the roadmap though: https://github.com/benbjohnson/litestream/issues/129

tyre · on April 30, 2021

This is awesome! I guess more likely given the title of this post, but still pretty cool that the author of a library is here in the comments.

pihentagy · on April 30, 2021

How does this relates to Debezium?

simonw · on May 1, 2021

Looks to me like it uses the same trick as Debezium (and similar projects) - listening to the replication stream and turning it into usable information - but it's a totally separate implementation built on top of Elixir: https://github.com/supabase/realtime/blob/master/server/lib/...

yawnxyz · on April 30, 2021

How is this similar/different from using CouchDB as a live database? I've been toying around with both (hosting CouchDB on DO)

Which one would you recommend for small-ish (~100 user, <10k records) projects?

slashdev · on April 30, 2021

For a project that small you can use NoSQL without issue because you can always scan the whole thing when needed. For larger data sets you want the relational model unless you're very sure you don't need it (and most people who think they're in this category are just wrong and will find out later the hard way.)

yawnxyz · on April 30, 2021

Thank you, that's really helpful!

I'm always wary of getting stuck in the "I wish I didn't use NoSQL" camp, but thankfully I haven't been in that situation yet

gregw2 · on May 1, 2021

Anyone know of any open source or cheap web/browser apps that let you edit data in postgres tables?

Using this tech or something more primitive?

I want to get some end user feedback on predictive model outputs...

anaganisk · on May 1, 2021

https://github.com/dbeaver/cloudbeaver

tarun_anand · on May 1, 2021

How does this work if I also need to replicate the database for disaster recovery using replication methods other than logical?

hmsimha · on April 30, 2021

This is something I have an immediate need for (and was about to build myself) to use with a PostgreSQL/TimescaleDB instance. Is it possible to have new subscriptions get up to an hour of historical data before streaming starts, or even better, to supply a parameter to receive historical data from a timestamp?

sk5t · on April 30, 2021

Stream history is kind of thorny; oughtn't one consider Kafka, Kinesis, etc., for that instead?

hmsimha · on April 30, 2021

To be clear, it's not really stream history that I'm after. Just the result of a database query upon connection ( `select all items from table X with time > (now() - 1hr)` )

gregw2 · on May 1, 2021

not open source, but I believe snowflake allows this

sscarduzio · on April 30, 2021

Will this work on AWS RDS?

cglace · on April 30, 2021

So this is similar to how you could subscribe to queries in meteorjs.