We have layered Hasura over an existing set of SQL Server databases to provide a public facing API for our product. [1]
Overall the experience has been fantastic. The performance and authorization scheme is very good. It has allowed us to wash our hands clean of bespoke endpoint writing for our enterprise customers with complex integration requirements (for the most part... waiting on mutations!).
One thing I wish was handled differently would be Same Schema, Different Database support.
We have multiple multi-tenant databases as well as many single tenant databases. All share the exact same table structure. As it stands, we have to maintain a separate Hasura instance for each of these databases as the table names conflict and there is no way to rename or reference them differently. That leaves us with the minor annoyance of needing to instruct users on their appropriate server prefix (server1.fast-weigh.dev/graphql vs server2.fast-weigh.dev/graphql... etc). Yes, we could proxy in front of these and route accordingly. But that's just one more layer to deal with and maintain.
It sure would be nice to have a single instance to maintain that could handle database availability based on the role of the incoming request.
Even with the minor inconvenience of multiple instances, I 10/10 would recommend. It's a huge timesaver assuming you've got the data access problems it seeks to make easy.
> We have multiple multi-tenant databases as well as many single tenant databases. All share the exact same table structure. As it stands, we have to maintain a separate Hasura instance for each of these databases as the table names conflict and there is no way to rename or reference them differently.
We have the exact same scenario and solved in with the exact same workaround. As things stand, spinning up the Hasura instance is currently the last piece of the process we need to automate before we are fully able to onboard new clients without manual ops action.
Hasura V2, currently in alpha, is supposed to support multitenancy as its flagship new feature. However, the "same object name in different database" issue is still open and on the roadmap, so presumably it's a _very_ early alpha.
I'm overall very impressed with Hasura, but have found it cumbersome to work with if using UUID primary keys because it won't map them as the GraphQL ID type [0]. There are plenty of people using Hasura successfully in production environments, so I'm curious how others handle it. I'm hoping the answer isn't "just use int PKs", but it'd be helpful to know if it is.
Security through obscurity is only sweeping the problem under the rug instead of addressing it IMPO. I don't know what Parler is or was, but I don't think sequential int IDs would be the major factor that would lead to a website being scraped.
There are generally two types of information in applications, "public" and "privileged". The former has the IDs and such discoverable by an index or explore page, the latter requires authentication and has user-specific permissions.
For both cases, if hiding IDs is the only access control on the backend, it's fundamentally flawed. For both cases, if the access control is implemented well (in addition to rate limiting), integer vs UUIDs don't make a difference.
No one will argue that opaque identifiers is a sufficient security control but it's (slightly) better than nothing, at least it would've stopped a naive enumeration approach to scrape everything. Don't get distracted by that part, the main reason is many companies wouldn't want to publicly post a dashboard of how many users/entities/widgets they have for all to see but exposing sequential identifiers basically does that.
This _should_ be part of a multi-layer security plan though. You don't depend on it as a primary source of security, but why would you expose more internal information than needed? If something does go wrong with another layer that obscurity _helps mitigate the damage_.
In the case that there IS an index page that enumerates all entities you are correct. However, many systems don't provide such an index page.
In many cases it's useful for a page to be publicly accessible yet, not indexable.
This is why sites like YouTube have an "unlisted" level of permission, UUID keys are a convenient way to implement that level of access control.
UUID keys are very useful for distributed systems where a local machine wants to generate a unique key locally, and then later upload it to a centralized store. It's especially helpful in third normal form databases where often you'll need to create objects that reference each other via the primary key.
We built a messenger app, the client can generate the message row locally and cache it immediately upon submitting the row without waiting for the response from the server. As the primary key can be generated client side (it’s a UUID, it’s quasi guaranteed to be unique), there’s no clash with existing IDs server side. By using the optimistic response pattern the message appears in the frontend immediately. Once the response for inserting the row comes back, the message can be updated in case the server decided to set additional columns — all with the same message ID which the server gladly accepted. Wonderful.
Can't speak for all dbs, but many use a clustered index on the primary key. In this case, the physical rows are stored in the order of the index, rather than just pointers to the rows.
If you are inserting non-sequential data into a clustered index, every insert results in a non-trivial rearrangement of the rows. UUIDs are not sequential, so at scale you will experience performance issues if you are using UUID primary keys and the PK index is clustered.
You won't notice this until significant scale, however. You can still use a unique identifier alongside an incrementing primary key, and you could choose to use a more compact format than the UUID. 8 base32 characters have over a trillion combinations, and are nowhere near as unsightly in a URL.
> Can't speak for all dbs, but many use a clustered index on the primary key.
AFAIK, only MySQL (with InnoDB engine) and SQL Server, AFAIK, do it by default (always for MySQL/InnoDB, and by default unless you create a different clustered index before adding the PK constraint for SQL Server, but even then you can specify a nonclustered PK index.)
PG doesn't have clustered indexes at all, DB2 has a thing called clustered indexes which aren’t quite the same thing, Oracle calls having a clustered index on the PK an “index organized table” and its an non-default table option, and SQLite has what seems equivalent to a clustered index ONLY for INTEGER PRIMARY KEY tables not declared as WITHOUT ROWID.
> You can still use a unique identifier alongside an incrementing primary key, and you could choose to use a more compact format than the UUID.
A key point of using a UUID is distributed generation avoiding lock contention on a sequence generator, which is defeated by using both. Just “don’t use a clustered index where distributed key generation is important” seems a better rule, even if it precludes MySQL/InnoDB use.
Also, most DB’s that explicitly handle UUIDs store them compactly as 128-bit values. If you want to transform them to something other than the standard format for UI reasons [0], that doesn’t preclude using UUIDs in the DB.
That's not the same as a clustered index, it just does a one-time rearrangement of the table’s current contents; you have to run it after each change effecting the index to simulate a clustered index (with stable PKs, and the PK index, that would be after each insert, I think.)
You get closer to a clustered index (at the cost of more storage, but the benefit that you can have more than one on a table) with an index using INCLUDE to add all non-key columns in the index.
I guess it's just my default setup now. Beyond not wanting to leak data or have people mess around with URLs, I've had much better luck evenly sharding DB instances with UUIDs. Hasura ostensibly supports both (the option exists to create PKs using UUIDs), but then treats them entirely differently at the GraphQL level.
I suppose with a dynamic language, it may not be much of a problem. But, with a statically typed language you end up with a situation where your IDs are not GraphQL ID types and can't be without type coercion. Since the uuid type isn't standard, you need to add some sort of serialization to and from the uuid type. It results in a lot of unnecessary ceremony just to work with IDs.
I could be mistaken, but I think it also creates issues with client libraries that want to cache around an object ID, as the type is not a GraphQL ID.
I know this may not be exactly on topic, but do you know when version 2 will be released? My team has been waiting on a single feature from that release for what seems like months (slack in jwt verification time, right now the clocks on our hasura and keycloak servers are off and it is creating issues).
Hey! We released 2.0-beta.2 this week and we're planning on releasing 2.0-stable on Mon/Tue.
Sorry, I know it took a while, but we had to make sure we could make the upgrade path non-breaking given it was a massive release and this took longer than we expected!
Please do feel free to reach out if you need any help with the upgrade :)
I don't use it directly, our systems engineers maintain and configure it. They had some trouble getting clustering to work for a while, but not sure if that was really the fault of keycloak.
I've used Hasura for a couple of projects but I feel somehow and sooner rather than later, a requirement shows up that you an't really solve with Hasura. I don't mean oh this is a bit awkward in Hasura but rather this needs to be fully custom code exposing its own graphql (which Hasura would do an excellent job of stitching together). That being said some of the operational challenges with Hasura specifically metadata and migrations are better solved using other solutions.
Prisma when combined with Apollo on the other hand makes it easy to build GQL handler, which can handle strange requirements but also makes it easy to avoid Hasura induced awkwardness.
The Hasura team seems very component however and I hope they will work out these issues.
Hasura doesn't need to be your _only_ API. We use it as a time-saver for basic CRUD queries, but we do have a traditional REST webservice for everything else (in fact, Hasura was added later). Hasura just spares that backend code from being 95% soul-killing CRUD.
The space is def interesting and the product probably lowers the barrier to entry for development. My main question will be why would you use it against plain/open-source PostgreSQL RBAC + GraphQL server or something like https://www.graphile.org/postgraphile/ ?
I considered Postgraphile for our company but picked Hasura. While PG has better extensibility, Hasura has a more polished unboxing experience and way lower learning curve. Time to market is crucial for us as we needed to prove our MVP.
That's technically true, though Hasura is a well-funded startup with a $99+/mo cloud offering and Enterprise features while postgraphile is one main open-source developer who has support contracts available and accepts donations (the "pro" package is pretty minimal and costs just ~$25/mo).
You can get started with Hasura for free which is great, and you can run it on your own servers (if you want to manage a Haskell service) which is also great, but in practice choosing Hasura means you're relying on a company for your backend and choosing postgraphile means you're relying on an express plugin which you can get a support plan for.
I'm not saying one is the better choice than the other for this reason, many would prefer the funded company.
Nowadays most of these self-hosted apps run on Docker containers as a wonderful abstraction.
For example, in my company we have self hosted Metabase on App Engine Flex. It is written with Clojure and runs on the JVM. I know nothing about these things, yet I was able to make it run with high availability.
You could also run it on Kubernetes or other similar options elsewhere.
That's a good point; I've always assumed this isn't common with Hasura but I honestly have no idea. $99/mo just seems like a great deal if your bandwidth usage fits within the included 20GB/mo.
IMO both Postgresql RBAC and Graphile make a big difference.
The work that has been done over Graphile has been tremendous, very very polished and well documented.
I have been using it on https://stack.lol with great results.
Edit: I forgot that I was trying a "learning repository" pattern where I put longform comments throughout the code. A little more difficult to discover than markdown, as I've learned.
In general, and for SQL Server specifically, our intent was to add great support (low footprint, works out of the box) in a way that doesn't need DDL or write access to the underlying database.
This becomes super useful for folks building new applications or new features against existing SQL Server systems (which is a rather large ecosystem beyond just the database, since so many products use SQL Server underneath too!)
If you're into GO then there is GraphJin which can be used a standalone service or a library. Works with Postgres, MySQL and MSSQL soon. https://github.com/dosco/graphjin
I think the next generation of ORMs will be built on top of this approach. Being able to query and receive an entire object graph in a single round trip has changed how we develop apps at my company.
PostgreSQL's JSON tooling makes it much easier to build SQL queries that return different shaped data from different tables in a single query.
The row_to_json() function for example turns an entire PostgreSQL row into a JSON object. Here's a query I built that uses that to return results from two different tables, via some CTEs and a UNION ALL: https://simonwillison.net/dashboard/row-to-json/
with quotations as (
select 'quotation' as type, created,
row_to_json(blog_quotation) as row
from blog_quotation
),
blogmarks as (
select 'blogmark' as type, created,
row_to_json(blog_blogmark) as row
from blog_blogmark
),
combined as (
select * from quotations
union all
select * from blogmarks
)
select * from combined order by created desc limit 100
Even more interesting is what you can do with json_agg - it lets you combine results from other tables. Here's a demo that solves the classic problem of needing to include data from a table at the end of a many-to-many relationship (in this case the tags on entries on my blog): https://simonwillison.net/dashboard/json-agg-demo/
select
blog_entry.id, title, slug, created,
json_agg(json_build_object(blog_tag.id, blog_tag.tag))
from
blog_entry
join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
join blog_tag on blog_entry_tags.tag_id = blog_tag.id
group by
blog_entry.id
order by
blog_entry.created desc
limit 10
I rewrote a really slow geojson query using this technique recently, it’s very powerful and performant but would say that it trades off some readability.
General support for inherited roles is one of the things I'm most excited about because it makes a bunch of hard things around reusing and composition so easy.
This improvement plays really well along with things like "role-based schemas" so that GraphQL clients have access to just the exact GraphQL schema they should be able to access - which is in turn composed by putting multiple scopes together into one role.
Also interesting is how well this could play with other innovations on the GraphQL client ecosystem like gqless[1] and graphql-zeus[2] because now there's a typesafe and secure SDK for really smooth developer experience on the client side.
Role inheritance will really simplify our permissions system, looking forward to seeing that go live.
Those client libraries are interesting. We are using introspection queries and graphql-codegen to generate react hooks and typescript types for our schema and its working really well.
I am now building on Hasura. Love the experience overall, but some aspects are frustrating. For example, setting up authentication for a React Native (Expo) app with Auth0 is quite cumbersome, and the docs are a bit out of date.
In your setup, Hasura is verifying a claim inside the JWT returned from Auth0 right? The only Hasura configuration I remember was setting the Auth0 public key and creating roles. I struggled with React/Cognito/AzureAD/Amplify but Hasura wasn't the pain point.
I built an Expo app on top of Hasura and rolled my own passwordless authentication mechanism and it was far far easier than using Auth0, imo. Auth was done using serverless functions to send emails with one-time-use tokens, to match the received one-time-use tokens with those generated in the database, and to return JWTs which the clients can use to auth later on and which grant access to different roles (it was an app with roles for managers, employees, etc.)
You can even implement refresh token / auth token with rotation relatively simply. I felt Auth0 makes these things complicated to implement, whereas implementing them took only a few days and the docs / help online on how to do so are very good these days.
+1 that's probably my favorite way of doing it - https://github.com/martin-hasura/blog-django-graphql-hasura Even Django REST Framework and Hasura works pretty well cause you get auth + then if you want to do functions you get that for 'free' as well
the mobile app it generates is react-native with auth0 already included. It uses hasura as a backend, the pulumi config will deploy it to ECS for you if want.
tbh though I don't see how this is a problem with hasura and just a auth0/RN issue.
If anyone at Hasura is reading this. Are there any plans to make poll frequency configurable on a per-query basis?
We have different queries with dramatically different latency requirements (e.g. 1 second vs 5 minutes vs 1 hour). Currently we are only using Hasura for the low-latency queries, and are falling back to polling for other things. But it would simplify our development model if we could just subscribe to these changes with a lower frequency.
If we could additionally have some kind of ETAG-style if-not-modified support when initialising connections, that would be extra amazing.
@nicoburns: Yes, this is something that we've been thinking about, but we haven't put a spec together yet.
Could you open a github issue with a rough proposal of how you'd like to specify this information?
For example: at a subscription level (with a directive or header), or via metadata built-on query collections [1] (what Hasura uses underneath for managing allow-lists).
This feature is available with WunderGraph. You can configure Auth, caching, live query polling etc. at a global level and override it for each individual query:
Looks like you are associated with wundergraph( based on your posting history here). Would you mind making that clear in your post given you are posting on a competitors HN post.
We've built and launched our community platform on Hasura + PG RDS. Row-level permission has been a time-saver to launch a product quickly. Stability has been great - our Hasura container hasn't crashed / restarted in the last 9 months.
Their are downsides (it has proven frustrating for us to implement the authentication part for a role that is not user or admin) but I would definitely recommend Hasura to experienced developpers.
Hasura is great if you want to get a CRUD GraphQL API. Prisma on the other hand in an ORM that you import into your application code giving you more flexibility into the potential use-cases for it.
I think hasura supports this? Or do you mean native postgres users with postgres RLS? The hasura team has explained somewhere why they don‘t want to do this. IIRC it had something to do with caching and/or subscriptions.
Is caching not available if you don’t pay?it seems that hasura cloud is a paid offering. It also doesn’t seem like it works with the self hosted offering. Any plans for that?
Caching isn't discouraged or disabled in any way in the community edition of Hasura.
In terms of query-plan caching - which provides a lot of benefits for speeding up pre-query execution - that's already enabled by default as part of the Hasura engine.
Response caching is a little more complicated and requires a separate service outside of the main GraphQL server to keep the solution generalized (ex. redis, memcached, lots of other options).
We're definitely looking at ways to have some more examples as to how someone could go about rolling their own caching solution for self-hosted instances.
In the case of hosted solutions to caching, there's Hasura Cloud which pairs the cache with monitoring and some other usability and security niceties - but you could also use a service like GraphCDN (there are a couple other as well) in front of your Hasura instance which helps setup response caching.
Overall the experience has been fantastic. The performance and authorization scheme is very good. It has allowed us to wash our hands clean of bespoke endpoint writing for our enterprise customers with complex integration requirements (for the most part... waiting on mutations!).
One thing I wish was handled differently would be Same Schema, Different Database support.
We have multiple multi-tenant databases as well as many single tenant databases. All share the exact same table structure. As it stands, we have to maintain a separate Hasura instance for each of these databases as the table names conflict and there is no way to rename or reference them differently. That leaves us with the minor annoyance of needing to instruct users on their appropriate server prefix (server1.fast-weigh.dev/graphql vs server2.fast-weigh.dev/graphql... etc). Yes, we could proxy in front of these and route accordingly. But that's just one more layer to deal with and maintain.
It sure would be nice to have a single instance to maintain that could handle database availability based on the role of the incoming request.
Even with the minor inconvenience of multiple instances, I 10/10 would recommend. It's a huge timesaver assuming you've got the data access problems it seeks to make easy.
[1]: https://fast-weigh.com / https://docs.fast-weigh.dev