How We Turn Authorization Logic into SQL

winrid · on Nov 23, 2021

ISO any articles/documents related to scaling access control, for example if you have 100_000 users and 90k of them have access to some resource, but 10k do not, and you can't use groups that your customer knows about. Obvious solutions are "where allowed_user_ids = ... big list" or "where disallowed_user_ids NE ... small list"; the latter not a solution as you can't optimize this query with a normal tree-like index.

I suppose you could use some sort of bloom filter, or create/maintain groups behind the scenes somehow, but haven't seen many articles cover this.

gkaemmer · on Nov 24, 2021

Oso engineer here. It's a good question -- definitely a dearth of content about this topic.

Like pphysch said, using RBAC generally gets you 90% or more of the way there on this, and a good intro to this concept is the Tailscale post: https://tailscale.com/blog/rbac-like-it-was-meant-to-be/

In most cases (even with many 100s of thousands or millions of users), there are far fewer _roles_. So you can generally answer the question of "which users can access this resource" by answering the questions "which roles can access this resource", and "which users have those roles". If you're using SQL to store roles and role assignments, your query for all users becomes:

    select * from users
    join user_roles on user_roles.user_id = users.id
    where user_roles.role_id in [... small list ...]

This can get tricky if you have 100s of thousands or millions of _roles_, and each of those roles can be dynamically assigned access to a significant percentage of your resources. But that might suggest you're structuring roles incorrectly in the first place (and you should be using fewer roles with more users per role).

All that being said, I think there's definitely more to be written here. Keep a lookout -- we might do some more writing about the topic.

bob1029 · on Nov 24, 2021

> This can get tricky if you have 100s of thousands or millions of _roles_

If you use something like a bitmap index (e.g. Roaring Bitmaps), you can easily manage roles in-line on each user row if the role membership is typically sparse.

You can still maintain a separate Roles table as a canonical reference, but you would no longer need to join on it to determine who has what.

Yoric · on Nov 24, 2021

Out of curiosity, is there any kind of support for bitmaps in mainstream databases?

bob1029 · on Nov 24, 2021

You can just roll it yourself in the application logic and store as a raw binary column.

This all assuming you don't need to make queries along the axis of "give me every user with role X".

tremon · on Nov 24, 2021

Yes and no: there is no bitmap SQL type, but at least MSSQL packs multiple BIT-columns on the same table into the same word/byte and then uses bit tests for filtering. I suspect Oracle and Postgresql can do the same, but I don't know for sure.

nextaccountic · on Nov 25, 2021

I'm a little confused, does Oso employ row level security? If not, why not?

winrid · on Nov 24, 2021

I explicitly said can't use roles (groups). :)

But yeah, that's the cleanest way to go, if you can.

janto · on Nov 24, 2021

Can you encrypt the role names for when it might be seen by a user?

jzelinskie · on Nov 23, 2021

Disclosure: I'm a cofounder of Authzed, a YC company building a permissions database[0] inspired by Google's Zanzibar paper[1].

This is actually a really hard problem and depends on the systems with which you are integrating. We call this problem "ACL filtering"[2] and there are two general strategies: pre and post filtering.

We have a blog post[3] describing our API for pre-filtering which can stream results that you can then use build a SQL query or data-structures like bloom filters/bitmaps. We currently have a proposal on GitHub[4] for an extension to that strategy adding a denormalization/caching layer (which is applying a similar strategy to the Leopard Indexing system internally at Google).

You might also be surprised at the performance you can achieve with post-filtering by building an iterator in your programming language of choice that will batch together permission checks and amortize the cost of filtering those results from the set of all results that you pull out of your database.

Additionally, if you're interested deeper database specific integrations, there are hooks into various components such as Postgres's Row Level Security, but that typically means eschewing a cloud service to operate your database for you (e.g. RDS) so that you can install your own plugins.

[0]: https://github.com/authzed/spicedb

[1]: https://authzed.com/blog/what-is-zanzibar/

[2]: https://docs.authzed.com/reference/glossary#acl-filtering

[3]: https://authzed.com/blog/acl-filtering-in-authzed/

[4]: https://github.com/authzed/spicedb/issues/207

sa46 · on Nov 25, 2021

I haven’t seen any Postgres plugins that extend RLS but very interested. Any links?

pphysch · on Nov 23, 2021

"RBAC Like It Was Meant To Be" https://tailscale.com/blog/rbac-like-it-was-meant-to-be/

The key insight is to define policies as purely virtual; a policy never refers to real users or real resources. Instead, it refers to virtual users (via "roles") or virtual resources (via "tags"), which can then be associated with real users/resources without modifying the policy.

infogulch · on Nov 23, 2021

Interesting, this neatly divides authorization checks into three-ish questions: 1. What roles are assigned to the user?, 2. What tags are assigned to the resource?, 3. Do the answers to 1 and 2 overlap? Each of which could have a cached denormalized index and should compress well.

winrid · on Nov 24, 2021

Yeah I've built this kind of stuff before, was just hoping for some interesting information on the data access side - when you can't use tags/roles. But I always push for this approach when I can.

flurie · on Nov 24, 2021

I think I'm missing something. Are you reframing your original constraint, that you can't use groups that your customer knows about, as you can't use tags/roles? You can certainly use tags/roles that your customer doesn't know about. This is a common issue in something like healthcare data security.

winrid · on Nov 24, 2021

Sorry, when I said groups I meant can't use tags/roles as well. Conceptually they are the same to me, sorry for the confusion.

It also wouldn't work in the scenarios I've mentioned in other comments.

dragonwriter · on Nov 24, 2021

Why can't you use tags/roles on the data access side?

winrid · on Nov 24, 2021

Could be a lot of reasons, but what if you have millions of tags/roles, and a user has a large subset of those roles? This actually happened at one job...

What if you sell reporting on location based data with access control, and customers want fine grain control on who sees what? Then you also can't always use groups. Example: I'm a Healthcare provider with 15k locations. Manager X is responsible for 5k locations, so when we create reports we filter all the data by those 5k locations first. Same with viewing any data in the product.

Just curious how others approached this. You could obviously do grouping somehow.

tmd83 · on Nov 24, 2021

I have a similar use case but even worse. Rather than locations users are given access directly to the consumer/patient and the each resource is for some consumer x. So any access not just reporting we have to filter by a huge list of accessible consumers which sometimes can be as small as 50 but large as 100K+ which is massive challenge to scale.

We can obviously manage this still with careful optimization but I'm not sure if this is kind unlimitedly scalable say for 1M+ consumers. I wonder if this can only be handled by reframing business requirements (like location as you are using) or someone has better design/ideas.

winrid · on Nov 24, 2021

Yeah, same problem, but I don't think I ever saw more than 50k resources per user.

I think the best solution potentially is to have the system monitor the "whale" users and have a special case where you turn their list of resources into a single group/tag behind the scenes and add this single identifier to all the relevant objects. It'd be hell to get right and keep consistent but it'd be fast.

tmd83 · on Nov 24, 2021

It's hard to name things. There's consumers whose data the system records, by resource I meant the data of the consumer (bulk of the system) and obviously users.

The users are authorized on consumer (a set of consumer + a set of roles or action they can do). We have been able to move the role aspect to app layer very cheaply. User A is doing X say running a report, we know role R is needed for it, we then ask the system run the report for the customers user A has access to it with index on consumer for all resources in system allowing fast query. But if your consumer list is too large the efficiency falls.

We can't make auto groups at system level because adding a group column in resource level and updating would be too expensive. There's no just large grouping. Some users say have access to consumer 1..100K where another has random 100K from 1..1M.

pphysch · on Nov 24, 2021

Have separate admin-facing and user-facing RBAC. Don't try to force them under one roof. You have an organization, and a product. Both have different authorization needs (many products don't have any), even if they may use the same authentication mechanism.

I tend to keep admin RBAC simple and group-oriented. It's mostly for direct data access permissions (per table, etc), but can add others too. Different groups for different departments. Mainly for internal CRUD apps at first (e.g. Django Admin).

OTOH, User-facing RBAC is project-oriented and uses the virtual policy system described by Tailscale. The users themselves, in a project-manager role, can manage their own policies, memberships, and resources tags. I like to provide sane defaults and support for modifying policies should they even need it. Importantly, one user can have membership in different projects with different roles.

You'd create a project with the 15K resources, add the customer to it in a manager role, and then they choose the policies, tags, and roles to make it work for their organization.

Ofc, admins can also be users. Simply create one or more projects for internal use.

rzzzt · on Nov 23, 2021

Seconded! I can get to a million authentication-related articles and services, but "authorization theory" seems to be very hard to find. (It doesn't help that the big text box in the sky treats the two words as somewhat interchangeable.)

tmoertel · on Nov 23, 2021

There's the paper on Google's Zanzibar:

https://research.google/pubs/pub48190/

"This paper presents the design, implementation, and deployment of Zanzibar, a global system for storing and evaluating access control lists. Zanzibar provides a uniform data model and configuration language for expressing a wide range of access control policies from hundreds of client services at Google, including Calendar, Cloud, Drive, Maps, Photos, and YouTube. Its authorization decisions respect causal ordering of user actions and thus provide external consistency amid changes to access control lists and object contents. Zanzibar scales to trillions of access control lists and millions of authorization requests per second to support services used by billions of people. It has maintained 95th-percentile latency of less than 10 milliseconds and availability of greater than 99.999% over 3 years of production use."

gavinray · on Nov 24, 2021

You could use hashed token claims to encode a role or other identifying info.

If you use a webhook you don't have to worry about this, since the user doesn't get to see what you forward as the authorization claims payload. But if you're using a JWT to hold state, since JWT's are publicly viewable it has to be non-obvious and non-deterministic.

IE, something like:

  { "user_id": 5316, "$2a$09$ckHFIEsR8Sjveh1ynOLKguynRDGjEfMCZz0vsUk/Jtm/cW1SxibaK": "$2a$09$pCyjFN6perhHNspS1NFj0eScZ4.ZlOzzPMXhoMrYVmNLMx3MvXdYK" }

Where in the example above, the key is "X-User-Role" and the value is "moderator" that have been bcrypt hashed.

Probably you would want to choose more obscure/uncommon words. I'm not familiar with the details of hashing but it seems like a good idea.

Then you can attach the role (or other identifying) metadata to each request in a way that isn't immediately obvious what it's for.

Disclaimer: I'm not a security expert, possible (probable?) there is a gaping hole in this and it's an awful idea.

dillondoyle · on Nov 24, 2021

Maybe I'm not understanding. But in a JWT you can verify the signature from when you originally signed that user's JWT. So even if I change some data in my JWT, on the backend it will not match the signature. So just reject if it doesn't match, say someone change permission:read to permission:admin

https://jwt.io/

janto · on Nov 24, 2021

It's not about verification of the jwt, it's about hiding the roles from the user.

winrid · on Nov 24, 2021

The question isn't about encoding role information. It's about query / data retrieval optimization.

janto · on Nov 24, 2021

Roles are an approach towards query optimization, but the question placed a constraint on the user not knowing about them. Encoding the role information so it is not visible to the user is a potential solution.

winrid · on Nov 24, 2021

By not knowing about them I mean that the customer/company never defines the roles as part of onboarding, etc. In this case they define the restrictions per-user.

mst · on Nov 24, 2021

> Obvious solutions are "where allowed_user_ids = ... big list" or "where disallowed_user_ids NE ... small list"; the latter not a solution as you can't optimize this query with a normal tree-like index.

This is not exactly an answer but it's worth remembering that the lower the selectivity of a condition is the more likely it'll turn out that brute force and ignorance will actually work out better - loading data in batches and filtering out the ones that are disallowed by a secondary positive index lookup (assuming you've pre-materialised your 'is disallowed' list) could easily turn out to be a better option for a lot of cases than trying to be more clever than that.

The more general question is of course rather more complicated and interesting, but other people already had better answers than I do there in other replies.

Yoric · on Nov 24, 2021

Sounds to me like you can have a huge table of authorizations and compile it silently into a list of roles whenever the contents of that table are modified. This recompilation step may be comparatively slow, but it needs to be executed quite seldom and the roles/groups you end up with should small in number and be quite easy to index.

SPBS · on Nov 24, 2021

> you can't use groups that your customer knows about

I don't understand why you can't use groups to organize access control. Is this a customer-enforced restriction? Or a technical one?

winrid · on Nov 24, 2021

Sometimes it's customer enforced. Just based on various places I've worked.

One place for example sold location-based software, so users could have access to say 10k out of 100k locations, but those locations might not always be next to each other. Data would be stored with a location id, and to do things like "show the average star rating for the locations this user has access to", we'd do "aggregate star rating where location id IN [....]"

SPBS · on Nov 24, 2021

Oh okay, so you can store authorization data in a table mapping user_ids to location_ids right? Then instead of using `IN (big_list)` (which can't use indexes) you join on the mapping table (which can use indexes). Something like

    CREATE TABLE locations (
        location_id INT PRIMARY KEY,
        rating INT
    );

    CREATE TABLE authorized_locations (
        user_id INT REFERENCES users (user_id),
        location_id INT REFERENCES locations (location_id)
    );
    CREATE INDEX authorized_locations_user_id_idx ON authorized_locations (user_id);
    CREATE INDEX authorized_locations_location_id_idx ON authorized_locations (location_id);

    SELECT AVG(rating) FROM locations JOIN authorized_locations USING (location_id) WHERE user_id = $1;

winrid · on Nov 24, 2021

In (list) can use an index just fine in most databases. It's "not equal" that has trouble.

A better example would be:

Product {LocationID, Rating} User{id} UserLocations{UserID, locationID)

However, let's say you have an architecture where you have 100s of these "product" tables, since every piece of the app restricts access by location. So not only do you have Product but maybe you have FormResult{LocationID}, Feedback{LocationID} etc.

So for a service to find out what a user has access to, first they call the User service to get a list of locations, and then they pass the big list of ids to their own DB, whatever that DB is.

I guess this is one case where a monolith and good ol SQL would win. :)

deepsun · on Nov 23, 2021

Wasn't LDAP developed exactly for that?

winrid · on Nov 24, 2021

Yeah but the question is scaling queries against large data sets with access control, LDAP would only provide the meta data.

epberry · on Nov 23, 2021

> Oso’s Django and SQLAlchemy integrations turn partials from Polar into database queries... The SQL they produce relies heavily on nested subqueries

Live by ORM, die by ORM. This strikes me as particularly bad because these authorization queries may be running on every request. It's great to see Oso went direct to SQL to address this. And the asides about logical programming were fun as well.

airstrike · on Nov 23, 2021

Building bad queries with an ORM isn't necessarily the ORM engine's fault. It's really just a tool and, as such, its value really relies on one's proficiency with it.

One simple but illustrative example: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...

aidos · on Nov 23, 2021

But you can build any query with Sqlalchemy - even when using the ORM rather than the Core, you still have that control, right?

Don’t get me wrong, I’ve read stuff by these guys and I know they know what they’re talking about. Certainly, if you have to make it work for Django as well you’d be considering going straight to sql.

As and aside, I walked through their stuff a little while back and I’m very interested in what they’re building.

gfody · on Nov 23, 2021

agree this seems like the worst of both worlds between doing things application-side and doing things database-side. imo you're much better off embracing your db's security features

throw_m239339 · on Nov 23, 2021

Well (most) ORM often cannot take advantage of CTE, group concat, array types, json types and what not when building queries from code, for obvious reason. So they often can't return multi-dimensional data in a single query.

But ORM are still useful to get an application off the ground fast.

gfody · on Nov 23, 2021

do ORMs still target lowest common denominator ansi sql? I only know that EF doesn't - it has "providers" for each rdbms it supports and will liberally emit CTEs and db specific language features like "outer apply" or "join lateral" but then stupidly ignore powerful features like "for json" or "json_agg/json_build_object" which could eliminate the outrageously naive way it encodes 1:m and m:m results - it boggles how smart and stupid orms are at the same time.

jiggawatts · on Nov 23, 2021

JSON is too “lossy” to use automatically.

aidos · on Nov 23, 2021

To be clear, sqlalchemy does support those features.

agentultra · on Nov 23, 2021

fwiw, PostgreSQL has a built-in mechanism for filtering rows based on authorization rules: row-level security [0].

This can simplify your data-access layers quite a lot and pushes you towards better security practices like limiting the scope of permissions granted to your applications' role.

If you like Polar but can't use it for whatever reason it does a lot of what Polar does.

[0] https://www.postgresql.org/docs/9.5/ddl-rowsecurity.html

cocoflunchy · on Nov 24, 2021

Row level security requires creating a Postgresql role for every user if I understand correctly.

Is this something that is done in practice for web applications? Like in my Django app, create a Postgresql role for every user in my auth_user table?

SahAssar · on Nov 24, 2021

It doesn't, you can use any variable available. So for example in postgrest you get all variables in a JWT exposed as GUC (Grand Unified Configuration) variables and you can use that in your RLS checks.

The most common way is to create a role though, yeah. I find that creating a role is not a downside, since it is generally just a row in a table that postgresql's permission system also uses.

scottwick · on Nov 24, 2021

How does this work in terms of Postgres connections/pools? My understanding of this is that you'd do something like `SET SESSION user_id TO 123;` which sets the variable `user_id` for that connection. Subsequent queries could use that variable to do row-based authorization but they need to use the same connection. Is this how Postgrest does it? (guaranteeing the same connection for the lifecycle of a request that is)

SahAssar · on Nov 24, 2021

IIRC postgrest starts a transaction for each request, and inside of that sets the variables. I've done some testing to make sure that variables do not leak between requests.

See the local option here for more options: https://www.postgresql.org/docs/current/functions-admin.html...

If using the role-mapping feature of postgrest it works similarly, only setting the ROLE instead of the variables.

catmanjan · on Nov 23, 2021

Do you have much experience with it? Can you comment on the performance hit?

jansommer · on Nov 24, 2021

The performance hit can be quite big. I had a query that went from 1s to 400ms by disabling RLS. The security policy was a simple where org = abc. When I encounter performance hits like these, I refactor the slow query into a function with SECURITY DEFINER and a huge warning that you're on your own regarding security. Besides that, it's nice not having to worry if your SQL is accessing stuff the current role isn't allowed to see.

agentultra · on Nov 24, 2021

I do and the answer is always it depends. I'm not being glib! There are a lot of variables at play that will affect performances.

In general it moves computation closer to the data and in aggregate that generally offsets most increases in query times.

If you design your schemas carefully the performance cost is easy to swallow. As always analyze your queries under different table sizes and see what works for you.

The benefit is that your application code doesn't have to use any complex RBAC->SQL compilation. You can just 'select foo, bar, baz from mytable;` and RLS will take care of making sure your application servers never see the data that the user doesn't have access to.

SahAssar · on Nov 24, 2021

In my experience the performance hit is not much larger than having the permission check part of your SELECT/UPDATE/DELETE query, but I don't have hard numbers.

pphysch · on Nov 23, 2021

This seems like a fairly brute-force policy management approach that has the shortcomings outlined in this article from TailScale: https://tailscale.com/blog/rbac-like-it-was-meant-to-be/

What happens when you have sweeping changes to existing policies? It seems like you have to chase down every other line of DSL and fix policies individually.

eatonphil · on Nov 23, 2021

Are there any policy-language-libraries-backed-by-sql like Polar but that aren't based on logic programming languages? I don't really want to learn logic programming for this purpose nor do I want to require it on my coworkers.

I guess I'm just looking for a library + SQL shorthand that can easily interpolate request variables and session variables that gets declared in code where a route is declared. Just spitballing but something like `(blogs.id = $req.blogid).userid = $session.userid OR (users.id = $session.userid).isAdmin`.

This [0] is close but it doesn't have enough momentum to be well documented let alone usable as a library in every language you'd want (Go, C#, Python, Node.js, etc.).

Edit: Maybe OPA/Rego can in fact do this [1].

[0] https://github.com/mrumkovskis/tresql

[1] https://blog.openpolicyagent.org/write-policy-in-opa-enforce...

evancordell · on Nov 23, 2021

Rego is a nice language to use (IMO), but also has roots in logic programming and is closer to datalog than SQL.

nicoburns · on Nov 23, 2021

What advantage would this have over a middleware that implements this logic as a lambda?

ZikPhil · on Nov 23, 2021

Man, OSO write such good blog posts

gneray · on Nov 23, 2021

Thanks. What would you want to us to write about next?

jauco · on Nov 23, 2021

The question by winrid on scaling authz would be interesting

https://news.ycombinator.com/item?id=29323287

gneray · on Nov 23, 2021

Yeah absolutely

craz · on Nov 23, 2021

If you found this post interesting, here’s another great post about doing a similar thing with OPA Policies: https://blog.openpolicyagent.org/write-policy-in-opa-enforce...

ewuhic · on Nov 24, 2021

Side question - does anyone know of other python auth libraries, which support a fine-grained access control, ideally close to that of AWS IAM?

revskill · on Nov 23, 2021

Good pattern ! At least a library that makes sense. Thanks.

One point about production usage, you should adapt to a NoSQL backend as we want to query authorization logic against a caching layer for performance reason.

lpapez · on Nov 23, 2021

Why not simply add caching to the existing SQL solution?

revskill · on Nov 24, 2021

How ?

lpapez · on Nov 25, 2021

Well, you proposed converting SQL architecture to NoSQL with the goal of adding caching. So how would you add caching in the NoSQL case? What prevents you from doing the same with SQL?