OPA is a great tool for implementing a policy-as-code system. But if you're trying to use it for application authorization (e.g. fine-grained authz for B2B SaaS or a set of internal applications), you may find that its policy story is strong, but it doesn't really have a "data plane": you either store data in a data.json file and rebuild the policy any time that data changes, or make an http.send call out of the policy to fetch dynamic data.
Check out Topaz [0], which uses OPA as its decision engine, but adds a data plane that is based on the ReBAC ideas explored in the Google Zanzibar [1] paper.
Disclaimer: I work on the team [2] that builds and maintains the Topaz project.
Bundle servers provide a centralized "data plane" decoupled from the distributed component (OPA). You don't need to rebuild your policy any time data changes. Just push a new bundle with the data that changed, and OPA will fetch it as configured — either periodically or directly if long polling is configured.
This feels very much like OpenFGA[0]. I've been evaluating authorization tool for one of my side projects and honestly most tools feels like creating relationships in a graph-like database and querying to see if there is/isn't relationship between two entities. Is there more to this (besides the implementation details) or am I missing something from these tools?
On the first point, OPA is much older than OpenFGA. To really illustrate the point, OPA became a graduated project about a year before OpenFGA had their first code drop in the public GitHub repo. The OpenFGA people are aware of OPA and I'm sure they learned from the tradeoffs OPA made.
To the main point, what you described reflects the current trends of authorization. Define a data model, define data that adheres to that model, write declarative rules that consume that model, make a decision based on those rules.
Where things really start to differ is the kind of data that they bind against and how do you write rules. E.g. OPA is often used for either ABAC (Attribute) or RBAC (Roles) while OpenFGA is looking at ReBAC (Relationships). Each has their complexity tradeoffs, depending on the system being implemented. How easy or difficult a system makes these kinds of checks has a significant impact on how you write policies.
Yeah, that's what I've noticed too. Conceptually, they're more or less same giving an option of RBAC, ABAC or ReBAC and each offer their own DSLs (e.g. Oso, Ory Keto etc) and deployment strategies. It's been a bit harder to pick one honestly but I guess I'll just have to just use them to find which one fits for me.
Not sure why that matters, but OpenFGA is an implementation of Zanzibar, which isn't exactly new. There are many similar implementations to choose from should one want to model authorization via a graph database.
Topaz is essentially a combination of OPA (which is used as the decision engine, with full support for Rego), and a Zanzibar-style directory, which is fairly isomorphic to what OpenFGA has implemented.
The advantage is that it's a single container image (or go binary, if that's how you want to run it), and supports a combination of RBAC, ABAC, and ReBAC. ABAC is accomplished via the Rego language, which is as "standard" as it comes in the cloud-native world.
My team is using OPA in a re-build of an application that we support. One of the main goals of the rebuild is to ensure we don't end up in a situation where every little rule change (including UAM changes) requires a full rebuild/deploy cycle of the app.
OPA replaces a complex hard-coded, and largely inscrutable UAM model with a (still complex), but flexibly defined, independently testable, and easily inspectable single-responsibility model.
I like that OPA has built in support for testing rulesets. The partial evaluation feature is amazing, ands makes it easy to apply UAM filters to endpoints that return large sets of data (we have consistent query APIs across the app, so could do this with a relatively simple OPA-aware proxy).
It's not all sunshine and roses, and the result might seem overly complex for a lot of use cases, but in our case I think OPA has provided a nice clean abstraction and enabled us to disentangle our UAM from the rest of our code and move more quickly overall.
Curious if you have any lessons learned worth sharing. Our journey was like
1: Yeah, we can use OPA to get rid of all this legacy spaghetti code!
2: Wow this PoC really proves out the idea!
3: Whoa we have three use cases now running in production!
4: Wait, these remaining 20 use cases are way more complex. To our surprise, all this legacy spaghetti code _exists for a reason_.
5: We now have 5 use cases in production but the Rego is now quite convoluted and our application logic has actually increased in complexity.
6: Red button: okay this is going horribly wrong. Back out this whole thing.
7: Recognition: the reason this has gone horribly wrong is because the spaghetti code combines pure logic and side effects in a way that did not map well with OPA.
8: Regroup: first step is to refactor all the legacy code and separate policy logic from side effects in a meaningful way.
9: Refactor: implement the above redesign. The policy classes all now map naturally to Rego for all 23 use cases! Let's do it!
10: Reality: we don't want to. Our codebase is well-structured now and we like it. Adding OPA now feels like an unnecessary layer, an additional potential for network timeouts etc to creep in, an extra thing to maintain, an extra special case to handle in our safe deployment pipeline, an extra language to train developers on. Now _maybe_ if we ever wanted other teams to write up and maintain their own Rego policies, then _maybe_ we'd consider going with it in the future, but for now the reality is our team would end up doing that work for them anyway, and it doesn't seem worth the tradeoff.
Anyway, lesson learned: don't expect it to magically clean up all the garbage in your existing code. You'll do it wrong and things will be worse than when you started. Clean that up first, and _then_ decide whether and how you want to adopt OPA for your remaining needs.
We're currently evaluating OPA for adding RBAC to our open-source application [0]. We plan on using the Go API [1] and doing the policy eval directly in our app since our app is also written in Go.
The thinking is we'll have some basic built-in policies (like admins can do X, editors can do Y, etc) but also allow users to configure their own policies if they want by writing rego and loading their policy rules at startup time (via config). We'd document the inputs that we pass to the evaluation call such as request headers, IP, role, etc.
I'm curious if anyone has ever tried something like this or similar?
I have found OPA to be a fairly reliable and performant system in production. We were able to build a scalable RBAC solution that used OPA as evaluators. We had around 40k OPA instances serving around 350K qps with p99.9 hovering around 6ms.
The policy bundle were sharded and cached on the client side, so the QPS itself was not much of an impressive data point on the OPA front. On the cache side we were seeing a lot more traffic, ~2B qpm (queries per minute) on daily peaks and p99.9 around 20 us.
OPA, or rego? My experience working for Styra was that most people seemed to grok where OPA fit in fairly quickly, but struggled with rego. It's a very powerful language and well worth learning I think, but it's an investment for sure.
Rego is a DSL and the main purpose of DSLs is to simplify things (compared to general purpose programming languages), so in my opinion Rego is not a good DSL.
I work in a highly regulated environment and evaluated using Cedar or OPA.
The biggest advantage to OPA was the flexibility. This enabled not just an authorization decision, but the why behind it. No more questions of why did this person/system gain (or was denied) access, combing through dozens of rules to find the matching statements. Just pull up the log and read the results… This is incredibly useful during audits.
Cedar could not provide that level of detail (or so I was told by AWS representatives selling their hosted version).
It's a cedar related issue. I like to know every check that was run for a policy and the result. Cedar will only provide the name of the policy that granted/denied.
OPA is much more wide ranging. You can use it for permissions, sure, but also just about anything else you can imagine. I think that makes it much more compelling as a technological investment.
The benefit of Cedar mainly comes down to the language. Cedar was designed to sit in the middle of a runtime call, so it has reliably low latency (see comparison here: https://twitter.com/Sarah_Cecc/status/1766141060370329748) even at high scale. It's way more readable so it's easier to author and debug. And it's validated against formal methods proofs so certain properties of the language (like default deny) are mathematically proven.
More about the benefits of Cedar here: https://cedarland.blog/design/why-cedar/content.html
I tried to implement some simpler cases with the policy language, Rego (https://www.openpolicyagent.org/docs/latest/policy-language/), of OPA and found it overly cumbersome. A simple check like "if user is in group A and in group C, but must not be in group C" is hard to express in this language. It would be a trivial task in any somewhat decent programming language (e.g. JavaScript).
I understand why restricting the possibilities with an external DSL might be a good idea, but I consider Rego to be to restricted. I mean, in the the a policy is just a function saying basically "yes" or "no" (I know, it's not that simple with OPA, but it boils down to access yes/no, anyway).
OPA and its derivative projects really brought the idea of decoupled authorization as a viable option. It is a very powerful tool which can be applied to many layers of the architecture - from Kubernetes Admission Controllers being based on it through to network level authorization and up the full stack.
One area that is a constrained and narrow use case is around the actual application level permissions - eg what a user can do inside of your service. Having hand-rolled this in various companies - and the inevitable rebuilds that were required as requirements change such as adding a new, product packaging updates etc - you do end up with a complex web of logic - ether in your codebase or as Rego.
For these application level permissions - where the requirements really come from the product/business rather than engineering - I always felt there could be a simpler way of defining this rules. Policies needed to be in a format a business user could understand, and enforcing them needs to be extremely responsive as checks are in the blocking path of every request - and this needs to work at large scale - all whilst making every decision auditable to tick all the regulatory and compliance needs around access controls.
To this effect we begun working on Cerbos[0] a few years ago which initially targets that one specific use case - models policy in simple YAML [1] (love it or hate it!) and takes a stateless approach meaning it is infinitely scalable with none of the headache of synchronizing information about your users or resources to the authZ layer, also critically generates that single audit log of decisions.
Disclaimer: I work on the team that builds and maintains Cerbos[2].
1. Define policies using declarative language Rego
2. Deploy OPA alongside your service as a sidecar in Kubernets
3. Make your service queries OPA when it needs to make policy decisions, passing the current state/context as input.
4. OPA evaluates the policies written in Rego against the input and returns a decision (allow or deny) back to your service.
Found it's hard to convince everyone around to use OPA/Rego and wrap into a managed service. The main objection - wrapping another DSL (domain-specific language) is hard.
However it was relatively simple to convince my team to use featured complete Go library Ladon https://github.com/ory/ladon
All policies are loaded on the app start, stored in memory (not DB) and checked with the help of small middleware which triggered the following function.
Very negligible perfomance hit. Code is very simple, hackable, and can be subject for further optimisations.
Ladon is very fast. It's possible to run all user groups against all CRUD routes, and get the basic permission matrix or build some simple UI forms to test condition for better control.
P.s. Feel free to ping me in private @reactima (github, telegram) if you want to discuss the edge cases for the above.
For application authorization, Oso is a compelling solution. (Disclaimer: I work for Oso). It provides a DSL and a prescriptive, but flexible data model that are capable of modeling RBAC, ReBAC, ABAC, or whatever else you'd like to model. Obviously I'm biased, but I think it strikes a great balance between opinion and flexibility.
One significant complication that all centralized authorization solutions share is that you end up needing to reproduce application data in the authorization system. We've been doing a lot of work in this area to simplify data management and have some beta functionality available. I'll include some links to the docs for those.
Check out Topaz [0], which uses OPA as its decision engine, but adds a data plane that is based on the ReBAC ideas explored in the Google Zanzibar [1] paper.
Disclaimer: I work on the team [2] that builds and maintains the Topaz project.
[0] https://www.topaz.sh
[1] https://research.google/pubs/zanzibar-googles-consistent-glo...
[2] https://www.aserto.com