This works until you have an A->B->C hierarchy of entities and want to monitor the subtree rooted at a specific A. Then the WAL for table C, even with REPLICA IDENTITY FULL, will not tell you whether a change is relevant. At that point you need to do a JOIN anyways, so might as well just use a trigger and NOTIFY instead.
Two of the other reasons for this over triggers are also misleading:
- Setting up triggers can be automated easily.
- True, you only use 1 connection to the database, but you now operate this app. You could also run pgbouncer.
I wish I could do this, but I already have more than four billion As, so my A IDs need 64 bits. I use composite keys for my Bs (A ID + sequence number per A), which allows me to search on the pkey index using just the A column; but by the time I get to C, that gets unwieldy. So my Cs have UUIDs, my Bs also have UUIDs (secondary unique ID, not the pkey) and my Cs have a foreign key pointing to their B. While also having an indexed A ID field.
I'd love to just just have "A-B-C" as my Cs' IDs... but it'd only work for my use-case (i.e. be performant) if it was running on a computer with 256-bit registers.
Curious why your primary key are GUID?
Also if the primary key of table C is made of 3 column (A GUID, B GUID, C GUID) the index in PostgreSQL will not be that big with compression enabled because all the common prefix will not be stored redundantly.
Also having a single compound index on table C covering column (A ForeingKey, B ForeingKey, C GUID) is much better than having multiple index on table C.
Having the primary key generatable at application level has performance benefits especially on inserts. You can write all related entities in one go vs waiting for db to generate the the pk, and passing that to dependent entities.
Also the problem with clustered indexes in a GUID field are that the GUIDs are random, so when a new record is inserted, a significant portion of the data on disk has to be moved to insert the records into the middle of the table.
you can have "id" Generatable at application level without using GUID.
This is what twitter is doing "snowflake".
It generate tens of thousands of ids per second.
Those id fit in 64 bits.
And those id are sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another.
I would consider this model "usually incorrect" in that it is quite likely you will eventually want to be able to move an entity to a different parent. I would thereby always give people the advice that one "should" have primary keys for every entity type that are at least global for that local entity type (if not truly global for all entity types, as there is a lot of power doing such) and then implement this child relationship as a unique foreign key constraint, unless you have some very specific functionality or performance requirement that would force primary key conflation (and FWIW, "I want to use this WAL watcher" might count, but it seems like a steep price to pay being unable to do sane reparenting).
I agree the children entity ID must be already unique without having to use the parent entity id.
It's just that in my experience have the children table primary b-tree sorted on ParentID then childrenID make join much more efficient unless you can use table interleaving like in Google SpannerDB https://cloud.google.com/spanner/docs/schema-and-data-model#...
“Should” is pretty strong. The choice of PK has its own tradeoffs. But you are correct, if you go the prefix route, it makes the WAL approach trivial, at least while everything really is a tree and there are no many-to-many relationships.
This reminds me of a project I did a few years ago:
"Stockify is a live music service that follows the mood of the stock markets. It plays happy music only when the stock is going up and sad music when it’s going down."
https://vimeo.com/310372406
(it's of course a parody, but I made a functional prototype)
I know someone who uses tones as an easy way of knowing if there's a market move on some cryptocurrencies. Basically, there's different tones for a trade that traded higher than previous and for a trade that traded lower than the previous. Possibly pitch or volume or something was used to indicate the volume of the trade, I don't quite remember. Either way, he could quickly hear if there was a major market move or something just by the amount and tone of the beeps. He used this as a way to know then he should look at the chart or whatever.
We spend a lot of time training digital computers to deal with analog information that has been converted into digital forms, and I wonder how much we could also gain by finding better ways to convert digital information into analog forms that our brains (as analog computers) can better parse.
I had a related idea, to make a running program produce an audible hum like a car engine would. And the hum would vary depending on what paths of the program are executing, so you get an idea what/how your code is "doing" by listening in.
(But: I think live db-activity rendered to music as a kind of monitoring mechanism is potentially way cooler than just getting db updates over a web socket.)
This is how you guarantee your child will become an ace coder. The fractal audio will transform key centers in the developing brain, and give them subliminal knowledge of all major repos of note
We have this in New York City to monitor the flow of traffic on quiet residential streets. If the flow rate is OK, it's nice and quiet. If the flow rate gets too low (or a traffic light turns red) then a chorus of very angry car horns erupts, and you are jolted out of your train of thought, wishing you had the guts / tolerance for prison time to run outside and smash every single one of the cars with a crowbar.
I think a recording of this is actually built into Pagerduty to use as a sound when you're getting paged. I went with the "golf ball hit into a flock of geese" one, though. Every time that goes off my first thought is "OH GOD I'M DYING HELP" but then I look and it's just GCP down again. Ironically therapeutic.
There is an implementation and a paper called peep, the network auralizer. I've been thinking of building something like this in rust or go as a learning project, but audio streaming and mixing is harder than I thought. If someone has good libraries or tutorials there, I'd be grateful.
You’ve got to throw in some uncompressed pcm audio streams just to keep it interesting.
When we were learning Linux a friend and I used to pipe /dev/hda into /dev/dsp for fun and when we hit some fragments of uncompressed piano recordings we joked that we must have hit the kernel source code. Good times.
If Fauna went open-source, I'd make the case to pay for their hosting. There's business value in just having the option to host it internally or make modifications.
I agree. I love Fauna, and I'd definitely use their hosting even if they offered a self hosting option, but the closed source and vendor lock-in are really strong points.
For 2 you could look at meteor, which does a similar thing with mongo. It's not relational, but with supabasr you also don't really get the advantages of a relational db, since the update events are about single tables.
I'm really curious to see how this ends up doing vs. Hasura.
Postgrest (which is basically what Supabase is, obviously with other value-adds) or Hasura which basically exposes a GraphQL server that interfaces with Postgres.
Personally I prefer GraphQL as there's more tooling around that compared to Postgrest but it's interesting to see. In this case if supabase was GraphQL you could just use a subscription.
I'd be curious to know why supabase didn't go with GraphQL.
We're not opposed to GraphQL at all. We use PostgREST because it's very low-level. It's a thin wrapper around Postgres. It uses Row Level Security, works nicely with Stored Procedures, and postgres extensions (like PostGIS).
Philosophically we want to focus more on the database-centric solutions, rather than than middleware-centric solutions. This means a slower time-to-market, but it's beneficial because you have a "source" right at the bottom of your stack, and you can connect whatever tools you want to it (including Hasura!)
> I'd be curious to know why supabase didn't go with GraphQL.
I'm guessing because GraphQL is very limited compared to SQL.
From reading their docs [1] it seems they have a JS API for SQL with support for nested data a la GraphQL. Not sure if its their own or comes from some other library.
With postgraphile you get the full power of postgresql exposed through graphql.
Using a few commonly-used add-ons, you can write a whole lot of sql in gql and it Just Works, translating your deeply-nested and richly-filtered gql query into a single, performant sql query.
I recently spent a week doodling around with postgraphile and postgrest. I really liked both, writing everything in SQL is super great. It does feel a bit weird at first but the amount of boilerplate it removes is fantastic and it really forced me to think about the correct schema for my business logic.
My only gripe with postgraphile is not supporting column grants in RLS policy.[0] Instead they recommend splitting tables in two and having one-to-one foreign keys and using grants on the full tables instead. It's a shame because PostgREST dealt with column grants just fine and I don't want to use postgraphile's specific "smart comments" in my database, they just seem like a really un-elegant solution to what is otherwise a super nice pattern: SQL everything.
I also considered Hasura but they have their own auth system and don't use RLS, which is a shame. Having access control in the database is super great when you have multiple APIs and different people with psql and different roles, all with different levels of access to the same DB.
0 :
> Don't use column-based SELECT grants: column-based grants work well for INSERT and UPDATE (especially when combined with --no-ignore-rbac!), but they don't make sense for DELETE and they cause issues when used with SELECT. https://www.graphile.org/postgraphile/requirements/
So one big difference is this tails the bin log (as a replication slave), while Hasura polls. Hasura makes that polling efficient by multiplexing the polling queries transparently so they send one query to poll N queries on the same table with the same shape.
But once you're polling a large percentage of your whole data set, the bin log approach has a clear advantage.
I love it!
Only issue is if you can't consume the history of changes.
But this is good for caching materialized view in the client
1- subscribe to event and buffer event
2- run SQL query for your materialized view
3- apply all buffered event and new event to incrementally update the materialized view
this way the (slow/expensive) query of the materialized view don't need to be run periodically and your cache always is always fresh without need to set TTL.
If you websocket connection get disconnected, drop the materialized view and repeat step #1.
They use websockets and a set of plug-ins to support redis, rabbitmq, or many other things. However, Apollo is planning to remove subscriptions for the next major release.
> As far as subscriptions itself goes, Apollo Server 2 has a superficial integration with subscriptions-transport-ws that doesn't tie in to the plugin system, federation, Studio reporting, etc. And subscriptions-transport-ws isn't even the best maintained package that solves this problem. So 3.0 will remove that integration. I do hope that we do the work to add subscription support to the Apollo server-side platform as a whole sometime soon, but I can't promise a timeline for that
That's completely different answer: They remove old graphql-subscirption-ws package that is not maintained anymore and split everything to separate packages to make them work faster.
> As far as subscriptions itself goes, Apollo Server 2 has a superficial integration with subscriptions-transport-ws that doesn't tie in to the plugin system, federation, Studio reporting, etc. And subscriptions-transport-ws isn't even the best maintained package that solves this problem. So 3.0 will remove that integration. I do hope that we do the work to add subscription support to the Apollo server-side platform as a whole sometime soon, but I can't promise a timeline for that
Neat! Is the idea to subscribe to the websocket directly from frontend code or from an application server?
It sounds like it's the latter, which seems like an unusual use case for websockets. (or at least I can't think of anything else that does ws for a server-to-server API)
As far as getting cheap framing and bidirectional communication over an HTTP port goes, there's nothing to stop you using websocket server->server. For example users, many crypto exchanges publish their feed this way
- can make no assumptions about the client's networking than port 443 available, even if only via a HTTPS proxy
- can make no assumptions about the client's software except they can speak an extremely common protocol
- want security to work the same way it does for the rest of your services
"web-scale" doesn't come into it. Websockets are inherently difficult to scale because they're stateful, but in return you get the lowest latency in both directions the underlying network can provide with extremely reasonable (2-4 bytes) overhead compared to raw TCP or SSL
For me, this looks like the perfect implementation of a very scalable config service, where the read RPS can get very high, but the config values are rarely changed. The consumers of the config service can be either frontend applications or backend application, thus websocket is a great choice.
Yeah! Front end apps can skip the server and get data from the db directly. There's OAuth2 setup and you can authorize users to have crud access to tables, or even a granular as rows.
Sure can! I contributed on this project a few months ago. Hoping to use it soon on a project.
I implemented a portion of crud with a vuejs app. It's a pretty neat setup. Just authorize with OAuth2 and then if you have permissions youre good to go!
Do you know if there is something similar for SQLite? I've been looking for a while and wonder if maybe listening to the WAL is the best option. There are hooks to tie into but the database I'm looking at isn't mine (it's local on my machine for another application) so idk if I should be futzing around with it.
Litestream author here. Litestream monitors and replicates based off the WAL and there are some tricks it has to do to make that work. However, it doesn't currently decode those WAL pages and determine the changed data.
Looks to me like it uses the same trick as Debezium (and similar projects) - listening to the replication stream and turning it into usable information - but it's a totally separate implementation built on top of Elixir: https://github.com/supabase/realtime/blob/master/server/lib/...
For a project that small you can use NoSQL without issue because you can always scan the whole thing when needed. For larger data sets you want the relational model unless you're very sure you don't need it (and most people who think they're in this category are just wrong and will find out later the hard way.)
This is something I have an immediate need for (and was about to build myself) to use with a PostgreSQL/TimescaleDB instance. Is it possible to have new subscriptions get up to an hour of historical data before streaming starts, or even better, to supply a parameter to receive historical data from a timestamp?
To be clear, it's not really stream history that I'm after. Just the result of a database query upon connection ( `select all items from table X with time > (now() - 1hr)` )
Two of the other reasons for this over triggers are also misleading:
- Setting up triggers can be automated easily.
- True, you only use 1 connection to the database, but you now operate this app. You could also run pgbouncer.