Hacker News new | past | comments | ask | show | jobs | submit login
Hasura GraphQL Engine and SQL Server (github.com/hasura)
182 points by tango12 on June 18, 2021 | hide | past | favorite | 83 comments



We have layered Hasura over an existing set of SQL Server databases to provide a public facing API for our product. [1]

Overall the experience has been fantastic. The performance and authorization scheme is very good. It has allowed us to wash our hands clean of bespoke endpoint writing for our enterprise customers with complex integration requirements (for the most part... waiting on mutations!).

One thing I wish was handled differently would be Same Schema, Different Database support.

We have multiple multi-tenant databases as well as many single tenant databases. All share the exact same table structure. As it stands, we have to maintain a separate Hasura instance for each of these databases as the table names conflict and there is no way to rename or reference them differently. That leaves us with the minor annoyance of needing to instruct users on their appropriate server prefix (server1.fast-weigh.dev/graphql vs server2.fast-weigh.dev/graphql... etc). Yes, we could proxy in front of these and route accordingly. But that's just one more layer to deal with and maintain.

It sure would be nice to have a single instance to maintain that could handle database availability based on the role of the incoming request.

Even with the minor inconvenience of multiple instances, I 10/10 would recommend. It's a huge timesaver assuming you've got the data access problems it seeks to make easy.

[1]: https://fast-weigh.com / https://docs.fast-weigh.dev


> We have multiple multi-tenant databases as well as many single tenant databases. All share the exact same table structure. As it stands, we have to maintain a separate Hasura instance for each of these databases as the table names conflict and there is no way to rename or reference them differently.

We have the exact same scenario and solved in with the exact same workaround. As things stand, spinning up the Hasura instance is currently the last piece of the process we need to automate before we are fully able to onboard new clients without manual ops action.

Hasura V2, currently in alpha, is supposed to support multitenancy as its flagship new feature. However, the "same object name in different database" issue is still open and on the roadmap, so presumably it's a _very_ early alpha.


https://hasura.io/blog/announcing-hasura-graphql-engine-2-0/

Multi-tenancy seems like their main 2.0 push


I'm overall very impressed with Hasura, but have found it cumbersome to work with if using UUID primary keys because it won't map them as the GraphQL ID type [0]. There are plenty of people using Hasura successfully in production environments, so I'm curious how others handle it. I'm hoping the answer isn't "just use int PKs", but it'd be helpful to know if it is.

[0] -- https://github.com/hasura/graphql-engine/issues/3578


I don't have an answer for you but you do have my curiosity. Why did you choose UUIDs as primary keys?


With integer keys as an alternative?

UUID v4 keys don't give away information about the number of rows in a relation. You can directly use them in api responses.

In recent events, iirc, parler was so easy to scrape precisely because they were using int keys exposed in their api get endpoints.


You could start with a large, non-zero value for the initial key, to obfuscate the true number of records in the collection.


The difference between IDs of multiple resources would still leak count information.

It also makes it too easy to paginate through relations for certain use cases where you may want obfuscation.


Security through obscurity is only sweeping the problem under the rug instead of addressing it IMPO. I don't know what Parler is or was, but I don't think sequential int IDs would be the major factor that would lead to a website being scraped.

There are generally two types of information in applications, "public" and "privileged". The former has the IDs and such discoverable by an index or explore page, the latter requires authentication and has user-specific permissions.

For both cases, if hiding IDs is the only access control on the backend, it's fundamentally flawed. For both cases, if the access control is implemented well (in addition to rate limiting), integer vs UUIDs don't make a difference.

What do you think?


No one will argue that opaque identifiers is a sufficient security control but it's (slightly) better than nothing, at least it would've stopped a naive enumeration approach to scrape everything. Don't get distracted by that part, the main reason is many companies wouldn't want to publicly post a dashboard of how many users/entities/widgets they have for all to see but exposing sequential identifiers basically does that.


> Security through obscurity

This _should_ be part of a multi-layer security plan though. You don't depend on it as a primary source of security, but why would you expose more internal information than needed? If something does go wrong with another layer that obscurity _helps mitigate the damage_.


In the case that there IS an index page that enumerates all entities you are correct. However, many systems don't provide such an index page.

In many cases it's useful for a page to be publicly accessible yet, not indexable.

This is why sites like YouTube have an "unlisted" level of permission, UUID keys are a convenient way to implement that level of access control.

UUID keys are very useful for distributed systems where a local machine wants to generate a unique key locally, and then later upload it to a centralized store. It's especially helpful in third normal form databases where often you'll need to create objects that reference each other via the primary key.


We built a messenger app, the client can generate the message row locally and cache it immediately upon submitting the row without waiting for the response from the server. As the primary key can be generated client side (it’s a UUID, it’s quasi guaranteed to be unique), there’s no clash with existing IDs server side. By using the optimistic response pattern the message appears in the frontend immediately. Once the response for inserting the row comes back, the message can be updated in case the server decided to set additional columns — all with the same message ID which the server gladly accepted. Wonderful.


Why not use UUID as primary keys?


Can't speak for all dbs, but many use a clustered index on the primary key. In this case, the physical rows are stored in the order of the index, rather than just pointers to the rows.

If you are inserting non-sequential data into a clustered index, every insert results in a non-trivial rearrangement of the rows. UUIDs are not sequential, so at scale you will experience performance issues if you are using UUID primary keys and the PK index is clustered.

You won't notice this until significant scale, however. You can still use a unique identifier alongside an incrementing primary key, and you could choose to use a more compact format than the UUID. 8 base32 characters have over a trillion combinations, and are nowhere near as unsightly in a URL.


> Can't speak for all dbs, but many use a clustered index on the primary key.

AFAIK, only MySQL (with InnoDB engine) and SQL Server, AFAIK, do it by default (always for MySQL/InnoDB, and by default unless you create a different clustered index before adding the PK constraint for SQL Server, but even then you can specify a nonclustered PK index.)

PG doesn't have clustered indexes at all, DB2 has a thing called clustered indexes which aren’t quite the same thing, Oracle calls having a clustered index on the PK an “index organized table” and its an non-default table option, and SQLite has what seems equivalent to a clustered index ONLY for INTEGER PRIMARY KEY tables not declared as WITHOUT ROWID.

> You can still use a unique identifier alongside an incrementing primary key, and you could choose to use a more compact format than the UUID.

A key point of using a UUID is distributed generation avoiding lock contention on a sequence generator, which is defeated by using both. Just “don’t use a clustered index where distributed key generation is important” seems a better rule, even if it precludes MySQL/InnoDB use.

Also, most DB’s that explicitly handle UUIDs store them compactly as 128-bit values. If you want to transform them to something other than the standard format for UI reasons [0], that doesn’t preclude using UUIDs in the DB.

[0] seems like bikeshedding, but, whatever.



That's not the same as a clustered index, it just does a one-time rearrangement of the table’s current contents; you have to run it after each change effecting the index to simulate a clustered index (with stable PKs, and the PK index, that would be after each insert, I think.)

You get closer to a clustered index (at the cost of more storage, but the benefit that you can have more than one on a table) with an index using INCLUDE to add all non-key columns in the index.


Can't uuids be time based?


Prevents good data clustering on disc. Probably less of an issue since most DB's are probably run on ssd's now.


best solution is to use int/long primary keys, with a uuid column that has a unique index. then the uuid can be used with public-facing apis.


I guess it's just my default setup now. Beyond not wanting to leak data or have people mess around with URLs, I've had much better luck evenly sharding DB instances with UUIDs. Hasura ostensibly supports both (the option exists to create PKs using UUIDs), but then treats them entirely differently at the GraphQL level.


We use UUIDs in production too with Hasura. I remember it was painful to make it work.


not really sure what the issue here is

we use UUID's exclusively and it maps to a scalar uuid type without any issues

do you mind explaining a bit more?


I suppose with a dynamic language, it may not be much of a problem. But, with a statically typed language you end up with a situation where your IDs are not GraphQL ID types and can't be without type coercion. Since the uuid type isn't standard, you need to add some sort of serialization to and from the uuid type. It results in a lot of unnecessary ceremony just to work with IDs.

I could be mistaken, but I think it also creates issues with client libraries that want to cache around an object ID, as the type is not a GraphQL ID.


Ok that makes sense.

We are using typescript and graphql-codegen to generate our types and haven't run into that issue.


(Vamshi from Hasura)

We've been working on SQL server native support and we're happy to announce support for read-only GraphQL cases with new or existing SQL Servers.

Next up is adding support for stored procedures, mutations, Hasura event triggers [1] and more!

[1]: https://hasura.io/docs/latest/graphql/core/event-triggers/in...


I know this may not be exactly on topic, but do you know when version 2 will be released? My team has been waiting on a single feature from that release for what seems like months (slack in jwt verification time, right now the clocks on our hasura and keycloak servers are off and it is creating issues).


Hey! We released 2.0-beta.2 this week and we're planning on releasing 2.0-stable on Mon/Tue.

Sorry, I know it took a while, but we had to make sure we could make the upgrade path non-breaking given it was a massive release and this took longer than we expected!

Please do feel free to reach out if you need any help with the upgrade :)


How are you liking keycloak?

Was evaluating it against cognito and auth0 last week.


I don't use it directly, our systems engineers maintain and configure it. They had some trouble getting clustering to work for a while, but not sure if that was really the fault of keycloak.


I've used Hasura for a couple of projects but I feel somehow and sooner rather than later, a requirement shows up that you an't really solve with Hasura. I don't mean oh this is a bit awkward in Hasura but rather this needs to be fully custom code exposing its own graphql (which Hasura would do an excellent job of stitching together). That being said some of the operational challenges with Hasura specifically metadata and migrations are better solved using other solutions.

Prisma when combined with Apollo on the other hand makes it easy to build GQL handler, which can handle strange requirements but also makes it easy to avoid Hasura induced awkwardness.

The Hasura team seems very component however and I hope they will work out these issues.


Hasura doesn't need to be your _only_ API. We use it as a time-saver for basic CRUD queries, but we do have a traditional REST webservice for everything else (in fact, Hasura was added later). Hasura just spares that backend code from being 95% soul-killing CRUD.


The space is def interesting and the product probably lowers the barrier to entry for development. My main question will be why would you use it against plain/open-source PostgreSQL RBAC + GraphQL server or something like https://www.graphile.org/postgraphile/ ?


I considered Postgraphile for our company but picked Hasura. While PG has better extensibility, Hasura has a more polished unboxing experience and way lower learning curve. Time to market is crucial for us as we needed to prove our MVP.



That's technically true, though Hasura is a well-funded startup with a $99+/mo cloud offering and Enterprise features while postgraphile is one main open-source developer who has support contracts available and accepts donations (the "pro" package is pretty minimal and costs just ~$25/mo).

You can get started with Hasura for free which is great, and you can run it on your own servers (if you want to manage a Haskell service) which is also great, but in practice choosing Hasura means you're relying on a company for your backend and choosing postgraphile means you're relying on an express plugin which you can get a support plan for.

I'm not saying one is the better choice than the other for this reason, many would prefer the funded company.


> (if you want to manage a Haskell service)

Nowadays most of these self-hosted apps run on Docker containers as a wonderful abstraction.

For example, in my company we have self hosted Metabase on App Engine Flex. It is written with Clojure and runs on the JVM. I know nothing about these things, yet I was able to make it run with high availability.

You could also run it on Kubernetes or other similar options elsewhere.


That's a good point; I've always assumed this isn't common with Hasura but I honestly have no idea. $99/mo just seems like a great deal if your bandwidth usage fits within the included 20GB/mo.


IMO both Postgresql RBAC and Graphile make a big difference. The work that has been done over Graphile has been tremendous, very very polished and well documented. I have been using it on https://stack.lol with great results.


A couple years ago I built toy backends for the same app using both technologies so they could be compared as-directly-as-possible. The code* takes you through the architectural decisions: https://github.com/sastraxi/great-bear-postgraphile and https://github.com/sastraxi/great-bear-hasura

Edit: I forgot that I was trying a "learning repository" pattern where I put longform comments throughout the code. A little more difficult to discover than markdown, as I've learned.


In general, and for SQL Server specifically, our intent was to add great support (low footprint, works out of the box) in a way that doesn't need DDL or write access to the underlying database.

This becomes super useful for folks building new applications or new features against existing SQL Server systems (which is a rather large ecosystem beyond just the database, since so many products use SQL Server underneath too!)


If you're into GO then there is GraphJin which can be used a standalone service or a library. Works with Postgres, MySQL and MSSQL soon. https://github.com/dosco/graphjin


It’s pretty cool how Hasura uses/abuses Postgres JSON support to be able to run many queries in the same “command”


I think the next generation of ORMs will be built on top of this approach. Being able to query and receive an entire object graph in a single round trip has changed how we develop apps at my company.


Shameless plug: this is what we do in EdgeDB.



json_agg approach to prevent cartesian-product problem is AMAZING.


Can you elaborate on what they are doing?


I can take a guess.

PostgreSQL's JSON tooling makes it much easier to build SQL queries that return different shaped data from different tables in a single query.

The row_to_json() function for example turns an entire PostgreSQL row into a JSON object. Here's a query I built that uses that to return results from two different tables, via some CTEs and a UNION ALL: https://simonwillison.net/dashboard/row-to-json/

    with quotations as (
      select 'quotation' as type, created,
      row_to_json(blog_quotation) as row
      from blog_quotation
    ),
    blogmarks as (
      select 'blogmark' as type, created,
      row_to_json(blog_blogmark) as row
      from blog_blogmark
    ),
    combined as (
      select * from quotations
      union all
      select * from blogmarks
    )
    select * from combined order by created desc limit 100
Even more interesting is what you can do with json_agg - it lets you combine results from other tables. Here's a demo that solves the classic problem of needing to include data from a table at the end of a many-to-many relationship (in this case the tags on entries on my blog): https://simonwillison.net/dashboard/json-agg-demo/

    select
      blog_entry.id, title, slug, created,
      json_agg(json_build_object(blog_tag.id, blog_tag.tag))   
    from
      blog_entry
        join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
        join blog_tag on blog_entry_tags.tag_id = blog_tag.id
    group by
      blog_entry.id
    order by
      blog_entry.created desc
    limit 10
(Both these demos use https://django-sql-dashboard.datasette.io/ )


this. JSON Agg fixes the cartesian product problem.

Imagine having a table with large rows, and another table to join with small data but many rows.

Normally you'd do a inner join of some sort, and the data from "large rows" would be duplicated many many times - json_agg simply fixes this.

you can actually do the full table without json_build_object, too you can do something like

select json_agg(t.) from ( select from table ) as t

this makes joining multiple tables together very very easy, and performant.


I rewrote a really slow geojson query using this technique recently, it’s very powerful and performant but would say that it trades off some readability.


Happy customer on Postgres who just signed a large contract that needs SQL support.

Love the product and the team, keep up the great work.

Out of curiosity is support for multiple roles in the works?


Yes! We announced experimental support earlier and here's the new spec we're implementing that will support all databases and remote schemas too.

https://github.com/hasura/graphql-engine/issues/6991

General support for inherited roles is one of the things I'm most excited about because it makes a bunch of hard things around reusing and composition so easy.

This improvement plays really well along with things like "role-based schemas" so that GraphQL clients have access to just the exact GraphQL schema they should be able to access - which is in turn composed by putting multiple scopes together into one role.

Also interesting is how well this could play with other innovations on the GraphQL client ecosystem like gqless[1] and graphql-zeus[2] because now there's a typesafe and secure SDK for really smooth developer experience on the client side.

[1]: https://github.com/gqless/gqless [2]: https://github.com/graphql-editor/graphql-zeus


Role inheritance will really simplify our permissions system, looking forward to seeing that go live.

Those client libraries are interesting. We are using introspection queries and graphql-codegen to generate react hooks and typescript types for our schema and its working really well.


I am now building on Hasura. Love the experience overall, but some aspects are frustrating. For example, setting up authentication for a React Native (Expo) app with Auth0 is quite cumbersome, and the docs are a bit out of date.


In your setup, Hasura is verifying a claim inside the JWT returned from Auth0 right? The only Hasura configuration I remember was setting the Auth0 public key and creating roles. I struggled with React/Cognito/AzureAD/Amplify but Hasura wasn't the pain point.


Thanks!

Could you point us to the docs you were looking at for the auth0 integration?


I built an Expo app on top of Hasura and rolled my own passwordless authentication mechanism and it was far far easier than using Auth0, imo. Auth was done using serverless functions to send emails with one-time-use tokens, to match the received one-time-use tokens with those generated in the database, and to return JWTs which the clients can use to auth later on and which grant access to different roles (it was an app with roles for managers, employees, etc.)

You can even implement refresh token / auth token with rotation relatively simply. I felt Auth0 makes these things complicated to implement, whereas implementing them took only a few days and the docs / help online on how to do so are very good these days.


I rolled a very similar solution with django+graphene+hasura


+1 that's probably my favorite way of doing it - https://github.com/martin-hasura/blog-django-graphql-hasura Even Django REST Framework and Hasura works pretty well cause you get auth + then if you want to do functions you get that for 'free' as well


you should take a look at https://create-full-stack.com/

the mobile app it generates is react-native with auth0 already included. It uses hasura as a backend, the pulumi config will deploy it to ECS for you if want.

tbh though I don't see how this is a problem with hasura and just a auth0/RN issue.


If anyone at Hasura is reading this. Are there any plans to make poll frequency configurable on a per-query basis?

We have different queries with dramatically different latency requirements (e.g. 1 second vs 5 minutes vs 1 hour). Currently we are only using Hasura for the low-latency queries, and are falling back to polling for other things. But it would simplify our development model if we could just subscribe to these changes with a lower frequency.

If we could additionally have some kind of ETAG-style if-not-modified support when initialising connections, that would be extra amazing.


@nicoburns: Yes, this is something that we've been thinking about, but we haven't put a spec together yet.

Could you open a github issue with a rough proposal of how you'd like to specify this information?

For example: at a subscription level (with a directive or header), or via metadata built-on query collections [1] (what Hasura uses underneath for managing allow-lists).

[1]: https://hasura.io/docs/latest/graphql/core/api-reference/sch...


This feature is available with WunderGraph. You can configure Auth, caching, live query polling etc. at a global level and override it for each individual query:

https://github.com/wundergraph/polyglot-persistence-postgres...

https://github.com/wundergraph/wundergraph-demo/blob/906f72c...


Looks like you are associated with wundergraph( based on your posting history here). Would you mind making that clear in your post given you are posting on a competitors HN post.


We literally had to rebuild an app because this support was so bad and they sold us on it being ready. Yikes.

It also needs its own PG db to function in order to support SQL Server.

PG usage was pretty good. Auth sucked.

Usage in CI pipelines is hot garbage. Command line tooling does not work well with it at all.

I'd probably take the risk again for a toy...maybe.


We've built and launched our community platform on Hasura + PG RDS. Row-level permission has been a time-saver to launch a product quickly. Stability has been great - our Hasura container hasn't crashed / restarted in the last 9 months.

Their are downsides (it has proven frustrating for us to implement the authentication part for a role that is not user or admin) but I would definitely recommend Hasura to experienced developpers.


Do you mind explaining your role issue?

We've been building out a pretty complex RBAC system and i might be able to help.


Hasura + CLJS w/ Re-frame + WebSockets realtime subscriptions to GraphQL as EDN: https://github.com/Vetd-Inc/vetd-app/blob/cdac4d55f771b1928e...


How does this compare to Prisma?


Check out this page in the Prisma docs:

https://www.prisma.io/docs/concepts/overview/should-you-use-...

Hasura is great if you want to get a CRUD GraphQL API. Prisma on the other hand in an ORM that you import into your application code giving you more flexibility into the potential use-cases for it.


How about build-in authorization for Postgesql?


That already works! :)

Postgres was the first database we added support for!

https://hasura.io/docs/2.0/graphql/core/databases/postgres/i...


I think hasura supports this? Or do you mean native postgres users with postgres RLS? The hasura team has explained somewhere why they don‘t want to do this. IIRC it had something to do with caching and/or subscriptions.


How exactly is caching handled?


Query-plan caching is built-in and here's a docs link for response caching:

https://hasura.io/docs/latest/graphql/cloud/response-caching...


Is caching not available if you don’t pay?it seems that hasura cloud is a paid offering. It also doesn’t seem like it works with the self hosted offering. Any plans for that?


Caching isn't discouraged or disabled in any way in the community edition of Hasura.

In terms of query-plan caching - which provides a lot of benefits for speeding up pre-query execution - that's already enabled by default as part of the Hasura engine.

Response caching is a little more complicated and requires a separate service outside of the main GraphQL server to keep the solution generalized (ex. redis, memcached, lots of other options).

We're definitely looking at ways to have some more examples as to how someone could go about rolling their own caching solution for self-hosted instances.

In the case of hosted solutions to caching, there's Hasura Cloud which pairs the cache with monitoring and some other usability and security niceties - but you could also use a service like GraphCDN (there are a couple other as well) in front of your Hasura instance which helps setup response caching.


In greek hasoura means loss


I guess the name is play of words on Haskell (it's written on) + Sanskrit Asura (meaning demon)


Yes indeed losing money :-)


presence for subscriptions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: