By Celestine Kao, Product Engineer
If you’re here, you’ve likely been tasked with building SAML-based SSO as a requirement for an enterprise deal. If you’re just diving into the problem space of SSO / SAML, we’d first suggest checking out The Developer’s Guide to SSO. Otherwise, buckle up for a brief but titillating foray into why XML-based authentication is... challenging.
The attack surface for SAML authentication is extensive, mostly due to the fact that SAML is XML-based. XML is a semantic-free meta-language - it’s hard to form, hard to read, and hard to parse. Combined with the high complexity of the SAML specification and the number of parties involved in establishing authentication, we get what often feels like a big ball of mud and all the accompanying implications. Be prepared to tackle a steep learning curve, lots of bugs, high maintenance costs, attack vectors galore, and an absurd spread of edge cases.
Most SAML SSO security vulnerabilities are introduced by Service Providers (SPs) improperly validating and processing SAML responses received from Identity Providers (IdPs). This happens because SAML SSO is typically not a core-value feature for an application, nor is the implementation common knowledge for most developers. Unknowns become even more unlikely to be identified and addressed when the pressure is on to just deliver something to unblock a high-value contract - as is oftentimes the case. However, to build SAML SSO safely and securely in-house requires significant buy-in and investment by teams - on the scale of months, representing hundreds of thousands of dollars in developer time.
If not done right, you expose your application and your customers to potentially huge security risks. To drive that home, here are just a few recently published SAML-related vulnerabilities:
It should be evident by now that oversights in SAML implementations are ubiquitous and problematic, even among experienced engineering teams.
So let’s dive into some of the more common security pitfalls developers building SAML-based SSO should be aware of, as well as cover a few suggested countermeasures. Just to be clear, this guide is by no means comprehensive and is meant to provide a starting point for SAML security considerations as well as some follow-on resources.
Let's say we're integrating our application with Okta via SAML. Below is an example of an XML document we might get when attempting to authenticate a user, containing a simplified but valid SAML response:
Like we mentioned earlier, the SAML spec is complex and responses can get lengthy, so this example is comparatively quite terse. Keeping things to what you should know before we go into SAML vulnerabilities, let’s walk through what the response (<saml2p:Response>) is communicating:
For the purposes of readability, the SAML 2.0 XML snippets in the remainder of this blog will be simplified, use shorthand, and be stripped of nodes that would otherwise be required in reality but are not relevant to what’s being illustrated. We’ll use a mythical IssueTracker, ContractManager, and PayrollService as hypothetical SPs that have implemented SAML authentication, which you should think of as placeholders for your application or other SAML SSO-enabled apps.
The first step in processing a SAML response is parsing the payload. Parsing and loading an XML document into memory is an inherently expensive set of operations, but can be unexpectedly costly due to a feature of XML that allows references to external or remote documents, i.e. Document Type Definitions (DTDs).
When a DTD is encountered, parsers will try to fetch and load the referenced document as well. If the referenced document is large enough or results in infinitely looping references, your server can be slowed or even brought down trying to complete the process. The same holds true if the payload itself is very large, DTDs or not.
Two low-hanging mitigations you should implement to prevent buffer overflows are:
XML processing, and thus by extension SAML response processing, is vulnerable to buffer overflow attacks from other scenarios described later on in this post. And unfortunately, protecting your application from a service outage is among the most mild of outcomes compared to the possibilities exploiting XML DTD allows - it is a dark and anxiety-inducing rabbit hole.
So, if you’re not writing your own XML parser (generally not suggested), it’s important to vet the XML parser(s) your application and its dependencies use - ensure they handle other exploits like Billion Laughs and Zip Bombs.
The primary security mechanism in the SAML handshake is the cryptographic validation of XML Signatures (XML-DSig) - which establishes the trust chain between IdPs and SPs. XML-DSig validation should always be done prior to executing business logic; however, the separation between signature verification and operating on the rest of a SAML payload opens up SAML authentication to vulnerabilities exposed by what are called XML Signature Wrapping (XSW) attacks. These attacks have numerous permutations which can result in outcomes such as (but not limited to):
Original response (pre-XSW):
Modified response (post-XSW):
The broadest countermeasure to XSW attacks is validating the schema of the SAML XML document. Payloads for SAML responses of any given IdP should have a deterministic standard schema that can be used as a reference in a schema compliance validation module, which should be executed prior to XML-DSig verification. Here are example schemas used by OneLogin’s python3-saml package to perform XML schema validation. Schemas should be vetted local copies as opposed to being fetched from 3rd party remote locations at runtime or on server start.
All of that being said, schema validation isn’t foolproof; there is room for error in the validation module logic itself, as well as in the syntactic rigor of the reference schema. A second low-hanging countermeasure to XSW attacks that should be employed for the sake of redundancy is to always use absolute XPath expressions to select elements in processes post-schema validation. Explicit absolute XPath expressions set an unambiguous expectation for the location of elements.
Here’s an example of a valid response that’s been modified in an XSW attack (specifically a signature exclusion attack, more on that later):
This modification also exploits the common, incorrect, but not unreasonable assumption that a well-formed SAML response will only ever have a single assertion. So while XML-DSig verification would succeed for the signature returned by doc.getElementsByTagName(“Signature”), the assertion returned and processed by doc.getElementsByTagName(“Assertion”) would be the injected snek assertion. This attack would have been more likely to fail if the XPath expression “/Response/Assertion/Signature” was used in the assertion signature validation logic.
This sounds obvious, but make sure to check that a SAML response is intended for your app. This is low-hanging fruit that can prevent attacks exploiting IdPs that use a shared private signing key for all integrated SPs of a given tenant, as opposed to issuing unique keys per application. The most common attack entails the unauthorized lateral movement by a malicious user across an enterprise’s IdP-integrated apps:
A second scenario would be a third party impersonating your app and gaining user access. The likelihood of this attack vector being exploited is pretty low because the malicious party would need to be in possession of the IdP’s private signing key (among other things) - but we’re mentioning it for the sake of completeness:
There are Service Providers that don’t bother to check if they’re the intended recipient, relying only on the validity of assertion signatures to prove the sender is a trusted party and that the response is valid. But as we’ve illustrated above, valid signatures aren’t enough to prevent unwanted access.
When dealing with security and authentication, stay paranoid my friend, and have some additional redundancies to catch edge cases. In this case, some easy-to-implement checks are:
Like we mentioned earlier, cryptographic validation of signatures is the primary mechanism for determining the authenticity of SAML payloads. It’s a good idea to read through the W3C specs for XML signature processing because it anchors SAML security, but the pithy statement to remember when handling SAML responses is only process entities that are validly signed.
There’s a class of attacks that exploit poorly implemented SP security logic known as signature exclusion attacks. These attacks will insert forged unsigned elements, banking on the possibility that the SP’s security logic will skip XML-DSig validation if no signature is found. Another common slipup is implementing validation logic that checks only the first assertion’s signature and then assumes remaining assertions are signed. Here are some rules to follow to avoid the most common oversights:
➞ The entire SAML response itself should be signed
➞ Every assertion should be signed
Something to note is that you should not assume a response will have only one assertion, and furthermore, each assertion should be signed in its entirety.
➞ Only accept encryption schemes from an explicitly defined set of algorithms
JWT validation similarly can sometimes overlook this point, and in fact, there was a related Auth0 vulnerability exposed as recently as last year. If possible, we suggest hardcoding your validation logic to only accept RS256 as the encryption scheme. Otherwise, verify Algorithm attribute values are from a recognized set of URIs.
If you’re planning on using a third party library to do SAML processing and XML-DSig validation, be sure to vet what it does under the hood - especially if it does XML parsing and processing as well. As much as possible, try to avoid libraries that depend directly or indirectly on xmlsec1 or its dependency libxml2 - many SAML and XML libraries are just language-specific bindings for xmlsec1 and libxml2, and as a result, inherit all the same security vulnerabilities.
We highly suggest doing XML-DSig validation in native code - in fact, at WorkOS we built our SAML library from scratch for greater control, and so we can respond immediately to newly discovered SAML-related vulnerabilites.
XML canonicalization is the process of transforming an XML document into a semantically identical but normalized representation (more on this later). The CanonicalizationMethod specifies which canonicalization algorithm to apply - the most commonly occurring one we’ve seen is xml-exc-c14, which strips XML comments during transformation. This generally wouldn’t be a problem, except for the fact that most SAML libraries perform canonicalization prior to doing XML-DSig validation on the canonicalized assertions. Why is this a concern? Here’s what can happen if the library’s underlying XML element text extraction logic doesn’t consider inner comment nodes.
Suppose I’m a disgruntled developer who would dearly like a substantial raise from my company that uses Okta + PayrollService (this is all fictional, I’m not disgruntled). I used to work at PayrollService and so am pretty confident this exploit I’m about to attempt will work, because patching it never got prioritized in favor of feature work, and because no one external has noticed anything amiss, yet... Anyway.
I know that WorkOS IT always uses [email protected] as an administrative account for every app used within the organization (we don’t actually). So equipped with this knowledge and using my personal domain, I create a PayrollService account for a user [email protected] and set up SSO with Okta. Self-service free trials FTW.
Here’s a simplified SAML assertion authenticating me as a PayrollService user:
Now I can modify the SAML assertion by adding a comment:
This modified assertion doesn’t invalidate the signature because canonicalization will strip comments before XML-DSig verification - it will have the same canonical representation as the unmodified assertion. Great!
So now, believing the assertion is authentic, PayrollService checks to see which user is being authenticated. Its SAML library grabs the user identifier from the NameID element, but it, incorrectly, only reads the inner text of the element’s first node, i.e. [email protected]. Then PayrollService determines that [email protected] is indeed a user, and just like that, I’m in and ready to approve raises for myself and all my friends!
Duo Labs discovered this vulnerability back in 2018, and while some of the more commonly used open source SAML libraries have since addressed it, there undoubtedly remain many internal or open source libraries that haven’t. So echoing recommendations from before, vet your SAML and XML libraries.
Ideally, comments wouldn’t be purged prior to XML-DSig validation, so that injected comments would indeed cause validation to fail - but that’s unrealistic or inadvisable to try to enforce for a couple reasons, which we’ll leave for another time. Instead, you’ll want to make sure that:
Replay attacks occur when a SAML response is captured and re-sent to the Service Provider for duplicate processing, which can have outcomes like denial-of-service for your users, or if the SP charges by API request, eating up request quotas. The most robust countermeasure against replay attacks is preventing the capture of SAML responses in the first place - which can be accomplished by using HTTPS (should be a given already) and never exposing the SAML response to the browser. Here’s what the authentication flow could look like:
However, very few IdPs actually support the Artifact Resolution Protocol, a requirement of back-channel SAML authentication. As a result, most SAML implementations rely entirely on the browser to relay SAML payloads between the SP and IdP:
Because the SAML response is exposed to the user agent, it becomes trivial to capture (by inspecting the dev console, XSS, or with malicious browser plugins) and replay a response. So another approach to mitigating replay attacks is to maintain a cache of previously seen assertion IDs, immediately rejecting responses containing any assertion with an ID that already exists in the cache. A cache item could have a TTL equal to the expiry datetime of the originating assertion, for example:
A third much less robust but much faster to implement countermeasure (which should be implemented regardless) is logic that strictly enforces the validation window for assertions.
One last thing to note is that most SPs that implement SAML SSO use 3rd party open source SAML libraries for speed to value, yet are not protected against replay attacks because the strongest countermeasures require additional architectural changes.
As with most software engineering, building SAML SSO for enterprises follows the 90-90 rule. There’s a hill to climb to get to an MVP, and an entirely different hill if you’d like to sleep at night. SAML-based authentication is rife with sleeping dragons, of which this guide only introduces a very small subset - but hopefully it has been useful in helping you avoid some of them. If product requirements allow, try to avoid integrating with IdPs using SAML; a more modern, safer, and simpler alternative protocol is OpenID Connect. And if you’re thinking twice about building SAML SSO yourself in-house, then consider using a 3rd party vendor that makes it their business to provide a safe, performant, highly available, and super fast to integrate SSO API... like WorkOS!