Implementation challenges of a homegrown SCIM solution
SCIM provisioning is an important enterprise feature that provides user lifecycle management (ULM) and automated access control. Building this in-house means you must deal with fragmentation issues across onboarding, implementation, and triage, incurring significant engineering cost, delayed time to market, and potential security issues.
If you’re reading this, you’re probably one of the many B2B startups out there trying to build for (potential) enterprise customers. Directory Sync, usually implemented via the SCIM protocol, is standard, table-stakes functionality for these kinds of customers: it helps automate user onboarding, offboarding, access management, and other important use cases. But implementing SCIM yourself is an arduous, fragmented, and at times downright dastardly task.
In this post, we’ll run through some of the common challenges we see teams encounter when building their own SCIM integrations:
- Onboarding fragmentation – differences across IdPs mean you need custom documentation and flows for each during onboarding
- Differences in IdP implementation – IdPs interpret SCIM differently, and your system will require trees of logic to handle each unique case
- Triage and resolution fragmentation – when your directory gets into a bad state, you need to help your users triage (especially if it’s an IdP issue)
- Correctness – parallel and duplicate requests coming in from IdPs
- Scale – can your application handle large blocks of concurrent requests on initial synchronization from an IdP?
And if you want to avoid all of this headache, you can always use WorkOS to automate Directory Sync and get SCIM integrations to your customers faster.
Onboarding fragmentation
When it comes to Directory Sync, onboarding refers to the process of integrating your app with the customer’s Identity Provider (IdP, like Okta). This initial setup phase gets the two talking to each other, so the IdP can send requests in the future when a user needs deprovisioning, changes groups, or things like that. Here’s what a generic flow might look like:
- In your app, the user generates a URL (to send data) and an authentication token for the IdP
- In the IdP, the user creates a new app integration and adds the URL + token from your app
Your product can’t control what your IT admins do in their IdP. So as an app developer, you need to provide some sort of experience – whether it’s documentation or an in-app walkthrough – that guides your user through the above steps. You need the basics like choosing an IdP:
And then the in depth steps:
Without this kind of documentation, your users won’t be able to connect your app to their IdP at all. But authentication is only part of the picture – for Directory Sync, you need to map the existing roles and groups in the IdP to the roles and permissions that you have in your app. For example, if your customer’s IdP has team leaders in the “admin” group, and your app has the concept of an “owner” that you want to map them to, you need to direct your users towards how to create these mappings in their IdPs.
So there’s a lot of functionality to build here. But the real problem is that every IdP handles this app onboarding process slightly differently. The obvious place to start is that all of their UIs are different, so you need different guides, screenshots, and steps for each. But that’s only the beginning. For example, getting a user’s correct email address from EntraID (Azure) requires special logic depending on whether that user is cloud-managed or synchronized. OneLogin has several different ways to provision groups (and you’ll probably want to pick one). JumpCloud has a set of default attribute mappings that you need to work around. Most of these are small things, sure; but they add up.
To build this onboarding experience, your team will need to create test accounts in all of the IdPs you want to support, log in, go through these flows and grab screenshots for each phase. In WorkOS' sample SCIM integration guide, we have over 500 screenshots from various IdPs, just for directory sync. But some IdPs don’t even support the concept of a test instance for these screen grabs (Rippling is notoriously difficult on this front). So this development often happens alongside your first customer, working with them in their IdP to get this information.
Differences in IdP implementation
Onboarding isn’t the only area where differences in IdP details makes things difficult. SCIM is just a guidance – it’s a spec written by the IETF, not a protocol like TCP/IP. So IdPs can choose to implement it however they see fit, and in practice this leads to many small differences in implementation that you need to deal with in your app. And these details matter: SCIM touches on the core security posture of your app (who has access to what). Small lapses because you didn’t properly understand an IdP’s implementation can become major vulnerabilities. We call them SCIMcidents.
Let’s run through a few examples. Keep in mind that you can’t just read the SCIM RFC and find these things; many are undocumented and usually just encountered when you’re building this with your first customer.
Okta and user deactivation
The SCIM RFC specifies a `DELETE` endpoint for removing a resource. But Okta instead sends a `PUT/PATCH` request to set a user’s status to inactive. There’s good logic for this – they assume that the base case for a user is deactivation, and a full blown delete is rarely what people actually want. Nevertheless, it’s not what you’d expect. Plus, EntraID (Azure) does use the `DELETE` endpoint. So if you want to support both, your application will need to account for this by bifurcating your syncing logic.
Handling unique users
The SCIM RFC says that the `externalId` attribute should be the unique identifier for a user between the IdP and your application, but the unfortunate reality is that this is not a hard requirement. In an ideal world, any updates (group membership, etc.) that an IdP sends to your application would refer to a user by that ID. But in practice, because the SCIM protocol does not require every identity to have a clearly defined and unambiguous ID, teams will often rely on permutations of a username and ID, which creates even more fragmentation.
Groups and group memberships
With Directory Sync, access management is handled via group memberships: for example, if you’re in the “engineering” group, you get write access to GitHub repositories. Every IdP handles group memberships slightly differently. An Okta example: imagine you go on sabbatical, and get promoted while you’re away to engineering manager. You’d expect the IdP to issue a request that says that you’re no longer part of the engineering group; but Okta doesn’t, since you’re on sabbatical. So when you come back, you may erroneously still have access to engineering resources. EntraID (Azure), on the other hand, does send a request in this situation.
Google Workspace
Google, which is the third most popular IdP on the planet after Okta and EntraID (Azure), does not support SCIM publicly. The only way to get directory information is to pull the information to your application, which is the opposite of the SCIM spec (where IdPs push). So to support Google Workspace in addition to another IdP, you now need to worry about creating a worker architecture with regularly scheduled jobs to poll this data, alongside your existing infrastructure for IdPs that do support SCIM.
Custom attributes
A common use case for SCIM is adding custom information to a user profile in your application, like a user’s job title or profile picture. This is handled differently across IdPs:
- Entra ID (Azure)
- requires adding proper schema extension as prefix
urn:ietf:params:scim:schemas:extension:enterprise:2.0:User
orurn:ietf:params:scim:schemas:core:2.0:User
for attribute to show up - attributes will always come in as nested (can't do top level)
- requires adding proper schema extension as prefix
- Okta
- attributes with
urn:ietf:params:scim:schemas:core:2.0:User
prefix do show up as top-level attributes
- attributes with
So as a developer, you need to understand how to parse these differently for each IdP.
Triage and resolution fragmentation
A directory can get into a bad state, and resolving the state usually requires work in the IdP, not just your app.
Small issues like “a user is in the right group for your app in the IdP, but they aren’t showing up in the app” are commonplace. And if you go check the IdP’s logs, they might say something like “we couldn’t push this user for reasons x,y,z.” IT admins are rarely aware of this, and you (app developer) don’t have access to these logs. So resolving these kinds of issues is a manual back and forth with your customers. We often see teams asking a customer’s IT admin to do a force sync on a group, delete and re-enter an attribute, etc. We’ve even seen situations where IT admins are asked to delete the entire directory and rebuild it from scratch.
The reality is that the only way to build a resilient Directory Sync is to “simply” battle test your app by working with many customers across many IdPs. There are just so many small issues, exacerbated by fragmentation across different IdPs, that it’s impossible to know what they are in advance. The best you can hope for is documenting (or building functionality to handle) each when they come up.
Correctness
The whole point of SCIM is for your app to perfectly reflect the state that’s in the IdP. While things work the way you’d expect most of the time, IdPs can sometimes send concurrent requests. For example, imagine an IdP tells you:
- Add Dwight and Angela to the Engineering Management group
- Remove Dwight from the Engineering Management group
Whatever the ground reality is in the IdP that caused this to happen (could be a mistake), both of these requests can get issued at the same time. Depending on what order of operations you process these in, you can end up with a bad race condition because you don’t know which one of these requests should be processed first (in essence, should Dwight still be in the Engineering Management group or not?).
The important takeaway here is that handling SCIM in a serial, reliable way is extremely important, and because you have no control over how the information from an IdP is pushed to you, this is difficult to accomplish in practice.
Scale
The SCIM RFC specifies that IdPs push data to your app, not the other way around. It doesn’t take an expert to realize that this opens up your app to major scale problems; a bad SCIM implementation could flood you with 1M requests in a few minutes and bring down your entire system. Most IdPs do not support creating rate limits!
A simple example of this is onboarding a very large organization. Imagine you sign Salesforce as a customer. Onboarding the entire company will lead to a massive influx of requests from the IdP, and your app needs to be able to scale to handle that. But it’s not just large organizations that can cause these scale issues: bad SCIM implementations can do it too.
For instance, popular IdPs follow the SCIM spec correctly by sending events serially, preventing the next request from being sent if the previous one fails.
However, a larger enterprise might implement their own SCIM IdP, which could result in bugs causing concurrent calls race conditions. For example, imagine two concurrent calls (PUT /Groups), where the first adds 5 users and the second removes 2 users. If these don’t happen serially, you end up in a bad state.
WorkOS addresses such errant implementations by implementing DistributedLock on all SCIM endpoints as a safeguard.
Dangers of building SCIM support in-house
If you’ve gotten this far, you’ve hopefully seen that implementing SCIM yourself, especially across multiple IdPs, is an incredibly difficult, trial-by-fire task. There is no established set of documentation, no list of issues you’ll encounter, and no standardization: the only way to build a resilient system is to experience and resolve bugs and issues yourself.
The WorkOS advantage
Instead of implementing SCIM yourself, which can take months of initial development and ongoing investment, you can use WorkOS Directory Sync. With easy-to-use APIs and clear documentation, you can finish the SCIM integration in one sprint.
It provides native support for all common providers and normalizes the various types of fragmentation that occur during onboarding, implementation, and triage.
Some of the fastest growing startups like Vercel, Loom, and Webflow use WorkOS Directory Sync to provide seamless user lifecycle management and automated access control for their enterprise customers.