If your application is generating data that’s of interest to your customers (i.e. you’re doing something right), you’re going to get requests for webhooks at some point. But there’s not a ton of standard guidance for how to build them, especially on the security side. This post will walk through the basics of how to send out webhooks from your app, manage authentication, handle security, and provide a smooth developer experience to your customers.
Webhooks are reverse APIs, so they need non-standard infrastructure. The starting point for building webhooks is that your app is generating data that your customers want. Generally, you’d expose that via an API, authenticate your users with an API key, etc. - but the difference with webhooks is that your customers want to be proactively notified of what’s happening in your app. Your API is built to receive and respond to requests, while webhooks actively send out data to other systems based on internal triggers. That requires you to persist information on where you’re supposed to be sending data to, and the status of those endpoints.
In practice, this ends up looking like:
As with anything, this can get as complicated (a Kafka topic with a webhook consumer) or as simple (Lambda) as you want it to be. More on that later.
Finally, before we dive in, it’s important to define terminology, since things can get confusing:
Slack uses the “outgoing” vs. “incoming” terminology, but the idea is the same.
Handling authentication with webhooks is slightly trickier than with an API, because you’re sending data to an endpoint without receiving anything back - there are a lot of ways that can go wrong (spoofing the endpoint, infiltrating the network, etc.). That’s just from your end - the consumer also needs to verify that the data coming into their webhook endpoint (the accepts webhook events) is actually from your app, and hasn’t been spoofed / corrupted in transit.
The first thing you need to do is verify that the developer signing up to subscribe to your webhook actually owns the endpoint they’re giving to you. Standard practice is to send a test event to the endpoint, and ask the developer to verify they’ve received it - either by returning a 200, or by including a “challenge” that the endpoint needs to echo back. For example, Dropbox verifies webhook endpoints by sending a GET request with a “challenge” param (a random string) encoded in the URL, which your endpoint is required to echo back as a response.
But verifying the endpoint doesn’t solve the whole problem, because endpoints can still be spoofed, networks are uncertain, etc. There are basically two ways to tackle the auth problem:
This is the easiest way to avoid problems, and is the approach that Dropbox takes. When an event happens in their system (e.g. a document gets updated), they’ll send out a webhook that says something along the lines of “the document with an ID of 1234 has been updated by user 1234” - this information is completely useless by itself, so you’ll need to follow up by making requests to the API that translate those IDs into whatever information you need to take action on the webhook’s information. But it also means that if a third party gets ahold of the webhook’s payload, they can’t do anything with it.
The other more labor intensive way to handle auth is to actually... handle auth. You need to approach this from two ways: authenticating yourself as the sender, and authenticating the consumer (endpoint) you’re sending data to.
→ Signing your webhook
To verify to your webhook consumers that you indeed are who you say you are (and the data you’re sending via your webhook is legit), you can sign your webhook payload with a secret key. It’s easiest to do this symmetrically, but you can also use public/private encryption if you want. Stripe signs their webhook payloads with a symmetrical secret key in the request header, and gives users access to that key in their dashboard so they can verify the signature at their endpoint.
→ Protecting your webhook payload
Once you’ve verified yourself to your consuming endpoints, you’ll want to think about how to verify the consuming endpoints themselves, and that the sensitive data you’re sending in your webhook payload isn’t susceptible to hacking. There are two ways to approach this:
(A) Encrypt the entire payload
This is fairly uncommon among major webhook providers (Dropbox, Stripe, Twilio, etc.) and requires some extra work on both your and your consumers’ end; but it ensures pretty tight security.
(B) Certificate pinning
This is the most common way to handle payload security: only send data over HTTPS (this should be obvious by now), and require your consumer to provide the specific certificate they’re using. For example, Twilio won’t send webhook data to HTTPS endpoints with self signed certificates.
You’ve probably realized by now that there’s an impossible tradeoff here between developer experience and security. Sending no useful information in webhooks minimizes security risk, but requires a lot more work for the consumer. Including info in the webhook payload makes for a smooth developer experience, but is hard to do perfectly securely. That, and the fact that security is part of the developer experience, means you’ll need to weigh the risks and choose what’s best for your application.
Your webhook system will not be a perfect message queue, and you shouldn’t try to make it one - even companies like Stripe guarantee almost no integrity around ordering, number of events sent, and other ergonomics that you’d expect as a consumer. The general rule - and expectation from your consuming developers - is that you’ll send events at least once, but that’s about it.
→ Error handling and backing off
When you send your POST requests to the endpoints in your database, some of them will inevitably fail (DNS issues, incorrect routing, etc.). You want to retry to some degree, but not constantly and not forever. General best practices:
Ideally, all of your consuming endpoints should be returning 200s to all of your POST requests. If they’re not, that’s what you can use to determine retries / failure.
Webhook providers typically do not guarantee that events will make it to consuming endpoints in order. See, for example, Stripe:
Webhook providers typically also do not guarantee how many events they’ll send via webhooks, so consumers will need to make their endpoints idempotent to some degree. See, for example, Dropbox:
There’s overlap between events being out of order, concurrency, and duplicates (as you can see in the above screenshot). In general, you can spend time on improving your webhook system to try and avoid some of these issues, but it’s pretty rare to see in the wild.
As you start working on building a webhooks system, there are a couple of things you can set up early that will make things smoother (beyond your fancy Vim setup).
1) Testing with live URLs
Sending POST requests locally - especially as you’re debugging auth - won’t work, as a local dev server isn’t available to the public internet, and so it won’t be able to receive webhooks from the provider service. You’ll need to test against a public URL – you can set up a simple server as a test endpoint, or just use something like ngrok to tunnel to your localhost.
2) A sample events library
It’s worth investing time up front to build a library of sample events you’d want to send out via webhooks. Otherwise you’ll get stuck needing to trigger things in your app, or worse, via external providers like Okta (if you want to send out a webhook when a user authenticates, etc.).
3) The database
As mentioned above, you’ll need some sort of database to store all of the endpoints you’re sending webhooks out to. The schema should look something like:
There’s really no good reason to use anything other than a simple relational database for this to start, as it’s unlikely this table will scale to anything that will give you problems.
It’s helpful to log every time a webhook gets sent out (along with the payload, time sent, etc.) for debugging (and compliance) purposes down the road.
5) Separating events from webhooks
This is more of a high level architectural note, but it’s worth nothing that there should be a layer of separation between what’s happening in your business systems ("an event") and the actions that you take based on that event (like sending out a webhook). If something goes wrong with your webhooks, you don’t want that to impact other pieces of your application.
Stripe provides webhooks whenever an event happens (customer created, card charged, etc.). You can add your endpoint(s) via the UI (below), or through Stripe’s webhooks API (yes, an API for configuring Stripe webhooks).
If you’re a consumer, you can accept all of these webhooks on one endpoint, or set up multiple endpoints (one for each event) - in the latter case, the API is probably more useful. Here’s what a sample POST request to add a new webhook endpoint looks like (from Stripe’s docs):
Like we covered above, Stripe does not guarantee ordering of events, and events may show up in duplicates as well. Their system expects your endpoint to return a 2xx when the webhook gets sent out - if it doesn’t, they’ll retry in increasingly sparse increments until they eventually mark your endpoint as broken, and email you about it.
Predictably, the documentation is excellent, especially this best practices list.
We’ve outlined a few of the best practices for implementing your own webhook sending system, but as you grow, the work doesn’t stop here. As your service gets more popular and more and more users consume your webhooks, you’ll likely need to find ways to scale and deliver more and more events without additional latency.
A good start is taking a look at streaming event-based databases like Kafka (well, technically a pub / sub system) or AWS Kinesis with multiple worker processes doing the actual webhook sending. If you’re on the other side running a system that is consuming webhooks, you can scale your webhook ingestion the same way you’d horizontally scale for regular web traffic — by using a load balancer or reverse-proxy in front of your web servers. And if you eventually grow past that, you may even want to investigate a non-webhook based event streaming solution like AMQP or Tibco.
In the meanwhile, check out the WorkOS Docs to see how we implement common payloads and event types for making your app enterprise-ready.