Blog

The Developer's Guide to RBAC: Part I

Authorization often takes a backseat to authentication, but it becomes critical as applications scale and and require finer access control. This blog series covers the transition from basic role-based access control (RBAC) to more advanced fine-grained authorization (FGA), offering practical guidance for engineers implementing these systems.


Authorization is kind of like authentication’s scary cousin – it’s not something most developers worry about day 1, but you know it’s coming. Enterprises – and increasingly, smaller companies too – have rigid permissions requirements. They won’t even consider a product without granular access management. Building a robust, performant authorization system isn’t an if, it’s a when. But it’s complicated, opaque, and a lot of the science isn’t quite settled.

Like our guide to authentication, this series will walk through what developers need to know before implementing authorization, where it gets hard, and a sort of 201 perspective. This first installment will talk about authorization basics, and the second focuses specifically on working with identity providers (IdPs) via SCIM (or otherwise). 

Why does my app need authorization in the first place?

Authentication is a Day 0 problem; you essentially cannot build a useful app that doesn’t have user management in some way, shape, or form. But authorization is more like a Day 5 problem – you don’t run into it until you start selling your product into more serious customers. Most apps start with all users having the same levels of access and permissions, until a deal you’re trying to close says that they need a separation between admin and user roles. At this point, congrats, you need to build authorization. 

There are two camps of philosophy for how to build authorization into your app:

  1. Role-based – users are assigned a role, and each role comes with a set of permissions. This is commonly acronym-ized as RBAC, or role-based access control.
  2. Resource-based – each user has individual relationships with resources (like a repository in GitHub, or a base in Airtable). This is commonly referred to as fine grained authorization, or FGA. 

At WorkOS we firmly believe that FGA is the most scalable, foolproof way to handle authorization. But it’s not always straightforward to implement it from day 1 (or 5) – instead, it’s useful to think of a company’s timeline from basic authorization all the way towards complex, resource-based FGA. 

Consider the following journey of a team of developers building a completely fictional source code management platform – affectionately named BitHub – where you can host your code in cloud-based repositories. Like auth, which can be as simple or complex as your customers require, authorization starts out pretty basic. 

Stage 1: no authorization

In the first version of your app (and most apps), there is no authorization at all. Every page is accessible by every user. This is a completely reasonable way of modeling reality, and doesn’t become a barrier until the organizations you sell to demand a permissions scheme. In fact, there are some smaller SaaS apps out there that to this day have successfully stayed in Stage 1 of authorization; it all just depends on who your customers are.

For our team of BitHub developers, this would mean that every member of a given organization has full access to every repository. Not ideal, but also not completely untenable for smaller companies.

Stage 2: admins, and everyone else

The most common initial authorization-related request from customers is to add the concept of an admin. In its simplest form, admins can view certain pages that non-admins cannot. 

The difference between having no authorization and having the “admin” concept is not massive. The easiest way to implement it is by adding an `is_admin` column into your users table, and then adding a check to the pages that you want to gate to admins only. As long as the only difference between these two roles is viewing an entire page or not, the logic in code remains relatively simple.

Though it should be common sense, a surprising number of companies forget to show anything more than a 500 error when an un-authorized, non-admin user tries to access a page they don’t have access to. Take the time to communicate to your visitor why they can’t see the page!

Stage 3: n>2 roles

BitHub is taking off and getting in front of larger customers with more nuanced team structures. The time has come to move beyond `is_admin` and add other roles like repository owner, contributor, etc. Congrats, you have now entered the realm of permissions. A permission just means that a user is able to do a specific thing: it could be as simple as viewing a page, or as complex as editing a specific row in an Airtable base. Implementing permissions well is quite complicated and will require a completely new data model (more on this later).

Stage 4: the great beyond

The more customers you acquire, the more complex their needs are and the more adjustments you need to keep making to your authorization data model. Your largest customer has 17 different types of software engineers and all need custom configurations of repo permissions. Your product surface area grows, and the number of “things” a user could conceivably need permission for grows exponentially. Instead of the rigidity of roles and permissions, what you really need is the ability to have each user define a different relationship with each resource (in our case, repo). At this point, doing role-based authorization just doesn’t make sense anymore.

So in summary: if your B2B SaaS app is successful enough, you will inevitably eventually end up needing fine grained, resource-based authorization. With all of that in mind, let’s take a deeper look at both RBAC and FGA, how you might go about implementing each, and some of the 201 problems developers run into. 

Role-based authorization: basics and not-so-basics

Role-based authorization is built on two major components:

  1. Each user is assigned a role, like admin or viewer. There are usually anywhere from 2-10 roles in your typical RBAC-using SaaS product.
  2. Each role has a set of permissions, or things that users with that role can and can’t do. Permissions can be as simple as the ability to view an entire page, or as complex as editing a specific row in a table.

A basic data model for an RBAC setup where each user can only have one role might look like this:

If you want to allow for more than one role, you’d need separate roles mapping table that might look like this:

For each part of your application that you’d want to restrict to specific roles, you need to add what’s called a “check” – some code that makes sure that the currently authenticated user has a role that allows them to access it. 

All of this is 101. But when you get into the details of how you’d actually implement a lot of this stuff, teams vary pretty widely in how they implement things. Broadly speaking, there are two philosophies on how to handle both data storage and your permissions logic: centralized and decentralized.

Data storage: centralized vs. decentralized

The data about which roles each user has is obviously stored in your production database somewhere. But how does your application actually access it?

In decentralized systems, role information gets stored in whatever object you’re using for session management and authentication. If you’re using JWTs, you might store a user’s role (and in some cases, what permissions that role has) in the JWT itself. As long as the token is active, it’s super easy and fast to have your application logic check against it before allowing a user to take an action that you might want to be restricted. 

In centralized systems, you create some sort of service that queries your database every time you want to do a check. This is how Google Zanzibar works.

It’s obviously a lot more work and complexity to build a centralized service, which is why most teams start with a decentralized implementation. But there are a bunch of downsides to storing role information in a session token:

  • It’s already notoriously difficult to invalidate a JWT, so your system will not be real time when role changes are made.
  • If your stack is more complex and has several services, you need to recreate the ingestion logic for each service.
  • There’s a pretty hard limit on how much data you can actually store in these tokens.

On that last point, it’s worth reading Carta’s post on how they built a system based on Zanzibar. They started with decentralized, JWT-based authorization, but over time found that tokens were getting as big as 1MB (!) and taking a prohibitively long time to build. 

Permissions logic: centralized vs. decentralized

The logic that handles your checks (is the currently authenticated user allowed to do this?) can also be centralized or decentralized. 

In a decentralized implementation, checks are distributed across whatever part of your application they relate to. If BitHub has an endpoint for creating a new repository, that endpoint’s code would have a check to make sure the currently authenticated user has the right permission to be able to create a repository. As discussed above, the actual role or permission information might be stored in a session object, or it might require a database query.

In a centralized implementation, you have a separate service or module that does all of your authorization checks. You either import it or call it from whichever part of your application you want to restrict access to.

Decentralized checks are obviously much simpler and straightforward to implement, but quickly become hard to manage (multiple code owners, yikes). So most teams usually start decentralized and then centralize things when the check sprawl becomes too burdensome.

Role Explosion

A very real problem that teams run into when using an RBAC system is called role explosion. At some point, you have too many customers with conflicting role requirements and it starts to degrade your system.

Back to our BitHub example: when we built our V1 of authorization, we started with some basic roles: admin and viewer. Great. But as we continue to add new customers, a few here and there want adjustments. One organization asks to add a “creator” role, so users can create repositories but not have admin rights. Easy enough. But then another organization asks for a “moderator” role, a second asks for a “team lead” role, and a third has an unusual setup for their repos and needs a custom role that allows team members to manage only certain repositories. And so on and so forth…

The basic idea is that if you’re running multi-tenant SaaS, every time a customer asks for a specific new type of role, you’re faced with a choice:

  1. Implement the role as a sort of “override” just for that organization, which means you need to bifurcate your data model (bad), or
  2. Denormalize all of your data, have custom roles for each organization, and voila, you’ve got role explosion.

Once this gets hairy enough, many teams opt to give their customers the ability to create their own custom roles. The data model for this is essentially one giant roles table that needs to link out to some sort of permissions table:

And then each organization’s rows can only be edited by that organization. By the time you have 1000 customers, this table already has 1M rows and starts to slow down all of your authorization checks. This is exactly the situation from our earlier story: once your RBAC setup becomes sufficiently complex, you’re basically building FGA. Speaking of which…

Resource-based authorization: FGA

Resource-based authorization is, well, resource based – instead of creating the abstraction of a role (with associated permissions), users relate directly to resources or objects. In our BitHub example, an FGA approach would look at individual relationships between users and repositories, while RBAC would focus on what general permissions a user with a type of role would have. 

At WorkOS we firmly believe that FGA is the most scalable, foolproof way to handle authorization. On a long enough timeline, every RBAC setup becomes too complex – especially as more and more apps become collaborative and host some kind of user generated content. 

So what does FGA actually look like? There are two main schools of philosophy.

Policy languages, like OPA

Policy languages are like DSLs (sort of) for specifying how users get access to resources. They’re kind of like a single interface for authorization checks (Carta’s wording). A popular open source implementation is Open Policy Agent, or OPA for short. A sample snippet from their docs, modified to our BitHub story, shows how you’d restrict deleting a repository to only users who have ownership over that repository:

      
package application.authz

import future.keywords

default allow := false

allow if {
	input.method == "DELETE"
	some repo_id
	input.path = ["repos", repo_id]
	input.user == input.owner
}
      
      

The thing about policy languages is that they don’t deal with the actual storage of your user and resource data – they just act as an interface between it and your app. Carta found that OPA didn’t work for their complex permission hierarchy:

This worked fine for simple permissions, but complex ones caused major issues for us. Complex permissions often queried several data models and added hundreds of milliseconds to response times.

For a more homegrown implementation story, check out this blog post by Figma engineering.

Full systems, like Zanzibar

In 2019, Google released a paper detailing how they built their internal authorization system, called Zanzibar. They didn’t include an official open source implementation, and since then we’ve seen several startups try and attack this problem with their own implementations. 

Zanzibar is based on tuples that represent relationships between users and objects:

      
(user, object, relationship)

→

(user_id, repository_id, owner)
(user_id, repository_id, creator)
      
      

The simplicity of these relationships solves the common role explosion problems you see with complex RBAC: every user’s relationship with an object is individual. Some people also call this ReBAC (relationship-based access control), which is not confusing at all!

You could conceivably implement a naive version of this in a database table:

But that table would get very large, very quickly, and would only work if you have an index on user_id, etc. So a lot of the magic in the Zanzibar paper is the system they built to implement it, which includes compute, storage, indices, regular running jobs, and more.

The last thing to note about Zanzibar is that it’s a storage system, but it won’t make your decisions for you – in that sense, it’s kind of the opposite of something like OPA. Zanzibar is basically really good at telling you what a user’s relationship with a given object is, really fast; the rest is up to your application logic.

If this all wasn’t already complicated enough, we left out one major detail: most of your enterprise customers will be running their identity through an IdP like Okta. How authorization works with IdPs and protocols like SCIM will be the subject of the second half of this series.

In this article

This site uses cookies to improve your experience. Please accept the use of cookies on this site. You can review our cookie policy here and our privacy policy here. If you choose to refuse, functionality of this site will be limited.