In this article
May 4, 2026
May 4, 2026

The identity join problem: Linking SSO profiles to directory users

Email and IDP ID both fail as universal join keys. The fix is sensible defaults with real escape hatches.

Explore with AI
Open in ChatGPT
Open in Claude
Open in Perplexity

A company uses Okta for SSO but Azure AD for SCIM provisioning. They need to link a user who authenticates via SAML to the same user who was provisioned via SCIM. Sounds trivial. It's the same human. But there's no reliable shared key between the two systems, and the naive approaches break in ways that are subtle, dangerous, and surprisingly common.

This is the identity join problem. Every team building directory sync hits it eventually, and the solutions that feel obvious at first are the ones that cause the most damage at scale. After spending real time in the weeds on it, debugging customer escalations, questioning our own recommendations, and digging through the data, we want to lay out exactly why it's hard and what we think a good solution looks like.

The setup: Two identity records, zero guaranteed overlap

When you integrate SSO and SCIM, you end up with two distinct representations of the same person:

  1. The SSO profile, created when a user authenticates via SAML or OIDC. It contains the claims the identity provider sends in the assertion: email, name, and an IDP-specific user identifier.
  2. The directory user, created when the identity provider (or a separate directory provider) provisions the user via SCIM. It contains attributes like email, name, and an externalId that the provisioning source assigns.

Your application needs to join these two records to answer a fundamental question: is the person who just logged in the same person who was provisioned into this organization?

The naive answer is "just match on email." The slightly more sophisticated answer is "match on the IDP ID." Both break in production, and they break in ways that are hard to detect until a customer files a support ticket.

Why IDP ID matching doesn't hold up

The standard recommendation, including one we've given customers in the past, is to use the IDP ID as the linking identifier. The logic is straightforward: the identity provider assigns a unique identifier to each user, and that identifier should appear in both the SAML assertion (as the NameID or a custom attribute) and the SCIM externalId.

In practice, this breaks for several reasons.

  • There's no guaranteed uniqueness constraint. In many SCIM implementations the externalId field has no uniqueness constraint at the database level. You can end up with two SCIM users in the same directory carrying the same external ID. When that happens, your "unique" linking identifier silently becomes ambiguous. Ambiguous matches in an identity system mean one user gets another user's access.
  • Customers use different IDPs for SSO and SCIM. You might assume that if a customer uses Okta for SSO, they're also using Okta for SCIM provisioning. That assumption is wrong more often than you'd expect. When SSO and SCIM run through different providers, the IDP IDs are issued by different systems entirely. Okta assigns its own internal user ID for the SAML assertion. Azure AD assigns a completely different ID for the SCIM payload. These IDs have no relationship to each other. They're opaque strings generated independently by two different identity platforms.
  • IDP migration breaks everything. Even when a company uses the same IDP for both protocols today, that won't necessarily be true tomorrow. Enterprise identity providers get swapped out during M&A, vendor consolidation, or security incidents. When the IDP changes, every previously working link goes stale overnight.

The IDP ID feels like the right answer because it's the most "identity-flavored" field available. It's not a stable cross-system join key. It's a provider-scoped internal reference.

Why email alone isn't safe

Email is the next candidate. Unlike IDP IDs, email addresses are human-readable, universally present, and tend to be consistent across systems. If jane.doe@acme.com shows up in both the SSO profile and the directory user, that's the same person. Right? Wrong.

  • Email reuse. When Jane Doe leaves the company, her email address may eventually get reassigned to a different Jane Doe, or to John Doe who inherited her role. If your system linked records by email, the new employee now inherits the old employee's session, permissions, and data. This is a known class of vulnerability, sometimes called account takeover via identifier reuse, that security teams actively audit for.
  • Duplicate emails within a directory. In large organizations, especially those going through mergers, you can have duplicate email addresses within a single directory. Two subsidiary companies might both have a jdoe@parent-corp.com before email migration is complete. SCIM providers will happily push both records. If your join logic assumes email uniqueness, you'll either silently drop one user or corrupt both.
  • Email format inconsistency. One system stores Jane.Doe@acme.com, another stores jane.doe@ACME.COM. Case-insensitive matching helps, but doesn't solve the underlying problem: email is a communication address, not an identity primitive. It was never designed to be a primary key.
  • Timing windows. SSO profiles and directory users don't always arrive at the same time. A user might authenticate via SSO before the SCIM provisioning event fires. If you create a tentative link based on email at authentication time, and then the SCIM event arrives with a slightly different email format or an updated address, the link breaks or duplicates.

This is exactly the problem that prompted us to dig deeper. A customer escalation around duplicate email handling forced us to question whether the fix we were building, blocking duplicate emails at the SCIM layer, was treating the symptom instead of the disease. The real issue wasn't that duplicates existed. It was that we didn't give developers a reliable way to link SSO profiles to directory users in the first place.

How common is the split-IDP case?

The honest answer: it depends on how you measure.

Internal analysis across our customer base surfaces a tension. Some measurements suggest roughly 40% of customers use different identity providers for SSO versus SCIM, meaning their authentication and provisioning flows produce fundamentally unrelated identifiers. Other measurements show the opposite, with 90%+ of customers using the same IDP for both.

The discrepancy comes down to how you define "different." A company might use Okta for everything, but their Okta SSO configuration and their Okta SCIM configuration were set up at different times, by different admins, with different attribute mappings. Technically the same IDP, still divergent identifiers and attribute shapes in practice.

Even in the best case (same IDP for both), you can't assume the identifiers will match. The join problem is universal, not an edge case.

What a real solution looks like: Configurable linking identifiers

Once you accept that no single identifier works universally, the design becomes clearer. You need two things:

  1. A sensible default that works for the majority of customers without any configuration.
  2. An escape hatch that lets IT admins and developers specify exactly which attribute to use for linking when the default doesn't fit.

The sensible default

For most customers, email matching with safeguards works. They use the same IDP for both SSO and SCIM, emails are unique within their organization, and the email field is consistently populated in both protocols. You don't need to ask these customers to configure anything. It should just work.

The safeguards matter, though. At minimum:

  • Normalize before comparing. Case-insensitive, whitespace-trimmed. A trailing space in a SCIM email attribute will break an exact-string match.
  • Handle the timing gap. SCIM provisioning and SSO login are asynchronous. A user might log in before they've been provisioned, or their SCIM record might update before their next login. Linking logic needs to be tolerant of records arriving in either order.
  • Fail explicitly, not silently. If a match is ambiguous (multiple candidates) or missing (no candidates), surface that to the developer. Don't guess. Don't silently create a new user. Return a clear signal that manual intervention or configuration is needed.

The escape hatch

For the remaining customers, the ones with split IDP configurations, ongoing M&A integrations, or non-standard directory setups, you need a configurable linking identifier.

This means exposing a setting (in the admin portal, in the API, or both) that lets the customer or the developer specify: "for this organization, link SSO profiles to directory users using attribute X from the SAML assertion and attribute Y from the SCIM record."

The configuration might look something like this:

  
{
  "organization_id": "org_01H...",
  "linking_strategy": {
    "sso_attribute": "custom_attributes.employee_id",
    "directory_attribute": "custom_attributes.hr_employee_id",
    "match_type": "case_insensitive_exact"
  }
}
  

Some concrete cases this enables:

  • Custom attribute matching. A customer stores an employee ID in a custom SAML attribute and as a custom SCIM extension attribute. They configure linking on employeeId instead of email.
  • M&A identity resolution. A customer going through a merger has users with two identities. They configure linking on a stable internal identifier that persists across the domain migration.
  • Cross-IDP linking. A customer uses Okta for SSO and Azure AD for SCIM. They configure linking on a shared attribute (like employee number) that both systems populate from the same upstream HR source.

The interface needs to be simple enough for an IT admin to configure in their dashboard, and flexible enough for a developer to set programmatically when onboarding a new enterprise customer.

Uniqueness enforcement is a prerequisite

None of this works without uniqueness enforcement on the linking attribute. If two directory users can have the same email, or the same employee ID, or the same IDP ID, your join produces ambiguous results regardless of which attribute you pick.

The platform needs to enforce uniqueness contextually, per directory and per linking attribute. If an organization has configured email as their linking identifier, duplicate emails within that directory need to be rejected at ingestion time, not discovered at authentication time.

This is where duplicate email blocking comes in. It's a prerequisite for reliable identity joining, not a standalone feature. Without it, the linking identifier is a suggestion, not a guarantee.

The enforcement happens at the SCIM ingestion layer. When a SCIM provider pushes a user with an email that already exists in the directory, the platform should reject the event with an HTTP 409 Conflict (or surface it for admin resolution) rather than silently creating a duplicate. This is a breaking change for providers that have been happily pushing duplicates, so it requires careful rollout: per-organization enablement, monitoring for rejection rates, and clear error messages back to the SCIM provider.

What reliable identity joining enables

Once the join is reliable, several features that enterprise customers expect become possible:

  • Just-in-time deprovisioning. When a directory user is deactivated via SCIM, the corresponding SSO sessions need to be invalidated. That requires a reliable link between directory user and SSO profile.
  • Attribute-based access control. SSO authentication tells you "this person is who they claim to be." Directory sync tells you "this person is in the Engineering department and the Platform team." Connecting authentication identity to organizational identity is what makes fine-grained authorization possible.
  • Unified user lifecycle. Onboarding, role changes, offboarding. These events originate in the directory, but their effects need to propagate to authenticated sessions. Without a reliable join, lifecycle events are disconnected from access control.

The broader pattern: Defaults with escape hatches

The pattern of sensible default plus configurable override comes up constantly when building for enterprise customers. The temptation is to either pick one approach and force everyone into it, or to make everything configurable from day one and overwhelm the majority who don't need it.

Ship the obvious default, make it work well, and build the escape hatch for the customers who need it. But you have to actually build the escape hatch. "Contact support and we'll handle it manually" is not an escape hatch. It's a bottleneck.

The lesson

The identity join problem is a reminder that enterprise identity is a distributed systems problem, not just a data modeling problem. The data looks simple: two records, same human, match them up. The systems producing those records are independent, asynchronous, and governed by different administrators with different configurations.

The solution isn't a clever matching algorithm. It's giving the people who understand their own identity infrastructure (IT admins and the developers integrating with them) the tools to specify how their systems should be linked. Smart defaults for the common case, explicit configuration for the rest, and uniqueness enforcement underneath it all.

The hard part was never the code. It was accepting that there's no single identifier that works everywhere, and designing accordingly. Once you stop looking for the universal key and start building a system that's honest about the ambiguity, the solution space opens up.

Every team building directory sync eventually discovers this the hard way. Hopefully this saves a few of them the trip.

If you'd rather not build this from scratch, WorkOS can handle both SSO and Directory Sync for you. Sign up today.

This site uses cookies to improve your experience. Please accept the use of cookies on this site. You can review our cookie policy here and our privacy policy here. If you choose to refuse, functionality of this site will be limited.