The Developer’s Guide to Audit Logs / SIEM
Our guide will walk you through the audit log basics that every developer should know: why audit logs are important, event formats, SIEM tools, retention best practices, and more.
App security means app visibility, and audit logs are becoming a key requirement of both enterprise customers and fast growing startups. Building audit logs into your app will help land those larger deals and give your customers conviction in your product’s security profile – but actually choosing which events to log, deciding on payloads, and exposing events via the right APIs is far from easy. Our guide will walk you through the basics that every developer should know: why audit logs are important, event formats, SIEM tools, retention best practices, and everything else you didn’t realize you needed to know.
The basics: what audit logs are and why you should care
CIOs and IT admins (and really anyone involved in security) need full visibility into the apps their teams are using: who’s logging in, resetting their password, and accessing what data. Let’s imagine we’re an internal IT admin at Kellogs – and we want visibility into how our organization is using Salesforce. A few things we might be curious about:
- Has someone tried and failed to log in 10 times in the past hour?
- Who accessed PII (personally identifiable information) in the past week?
- How many users accessed a sensitive report that was accidentally shared with the wrong team?
Audit logs answer these questions by providing a granular paper trail of what app users have done. Every time a user does anything of consequence – log in, reset password, create report, you name it – your system emits an event with useful metadata about that event. Companies will expose these events via a UI and/or an API so admins can analyze them at scale. Here’s a quick example of what an event might look like:
The data in audit logs have a variety of different possible use cases by IT admins. In specific industries such as finance, there are laws that require companies store a “paper trail” of user behavior (such as messages) in an audit log, often referred to as data retention. These logs may also be used for digital forensics in the case of a breach, or “e-discovery” during a lawsuit. However in most cases, audit logs are primarily used by admins to inspect user behavior during configuration and set up alerts for potential malicious actors or account compromise.
Audit logs aren’t only a requirement for giant monolithic enterprises; the need for visibility is being pushed “downstream” to smaller companies as they use more and more applications. Audit logs will help your business move upmarket, but they’ll also help improve retention and close your existing target deals faster.
What kinds of events to emit and how
Building audit logs into your app isn’t technically difficult – if you’ve instrumented your product with something like Segment already, it’s very similar code. As with most enterprise features, the devil is in the details: what kinds of events to emit, what the payload should be, how to store events, and how to integrate these events with the wider enterprise toolkit.
1. What kind of events to log
The goal of audit logs is to provide a paper trail of important actions that help IT admins and security folks get a detailed picture of what’s happening in your app – that’s the driving logic for what kinds of events you should be writing.
Chances are you already emit events in a similar fashion: but your audit log events should not be the same as analytics or observability events. This is a common mistake companies make - the kind of data a team needs internally for product analytics and general instrumentation will overlap with audit logs, but different use cases necessitate different data (especially handling concerns around PII). The same is true of your observability data: if you’re sending events to Elasticsearch / Prometheus / Honeycomb / your monitoring system of choice, those events might overlap with audit logging use cases – but you’ll need to do a ton of scrubbing and cleaning in advance.
There’s no catalog of what exact events to instrument, but a few general guidelines might help:
- Login / auth - Anything related to signing up, signing in, resetting passwords, updating usernames, etc.
- Data access - Opening, updating, or deleting a document or spreadsheet (whatever the canonical unit of your product is).
- Billing - Updating billing or subscription info.
- Search / navigation - Using search, navigating around the product.
Conversely, here are a few examples of things that you shouldn’t be including in your audit logs:
- Performance data - How long a job took to run, CPU usage, etc.
- Internal tracing - Errors, exceptions, and stack traces.
If you’re not sure what kind of events to include, look to other products and talk to your customers - they might have specific requirements custom to their compliance needs.
2. Event payloads and formatting
There’s no widely accepted standard for audit log payloads yet, so things are kind of a mess. The general rule of thumb is to include enough information to be useful, but nothing that doesn’t fit the audit log use case (e.g. product data). The typical format looks something like this:
- An actor - The ID of the user that’s carrying out the action (can also be the system).
- A group - The permission group(s) the user is part of, usually an organization ID or domain.
- An action type - The name of the event (password_changed, authentication_successful).
- Timestamp - A... timestamp. Keep things in UTC, as always.
Unsurprisingly, there is at least one standard that’s gotten some traction: it’s called the Common Event Format (CEF), and it comes from ArcSight, a SIEM provider (more on that later). The standard is aimed at making it easier to communicate between security systems, and prescribes a library of standard event fields like sourceHostName and and requestURL. However, in practice most audit log systems do not match the CEF.
Whether you follow the CEF or not, there’s one thing you can do: standardize internally. Keeping the same structure across all audit events will improve your user experience and make building your audit logs portal a lot easier. This is true across all product instrumentation, but doubly the case here - before building audit logs into your app, sit down with your stakeholders and customers to figure out which fields should be standard and required.
3. Retention windows
If you’re supporting thousands of users, your audit logs will start to take up a good amount of space. They’ll also need to be indexed for fast search and filtering on various facets. Unnecessarily keeping audit logs forever isn’t always the right approach. Companies typically run a 90 day retention window and allow customers to pay for longer term storage such as 1 - 3 years. We’ve seen companies store these logs in relational stores like Postgres, search systems like Elasticsearch, and even as flat files in an S3 bucket. Another important note: audit logs should be immutable once they’re stored. That’s the whole point.
4. Building a frontend
Once you’ve set up the right system for emitting events on the backend, you’ll need to build a frontend to expose those events to your users. We’ve seen implementations as simple as a table with events in it, and as full-featured admin panels with search, filtering, sorting, and exporting to SIEM solutions (more on that later). What you build depends on the use cases your customers have: detailed frontends are useful if your users are going to interact with audit logs directly in your app, but can be overkill if you’re dealing with much larger customers who will want to export these logs to a different tool anyway.
Once you’ve got your events system and frontend in check, you’ve got one more thing to worry about: integration.
Integrations: working with SIEM tools
When you’re serving hundreds or thousands of users, combing through audit logs manually to find the data you need isn’t going to work. Enterprise admins use specialized tools called SIEM (Security Information and Event Management) to ingest audit logs and provide search, grouping, and visualization. The relationship here is somewhat similar to Elasticsearch / Kibana, where the visualization tool is purpose-built for a specific use case - but SIEM tools go further down the stack to manage storage and search.
One of the best known SIEM tools is Splunk – it’s a general purpose aggregation and visualization tool with features built specifically for audit logs. Another good example is Sentinel, an Azure specific SIEM from Microsoft. The general idea is that you get a dashboard that aggregates key events over time, as well as granular search and filtering for when you need to dive into specific patterns.
Enterprises will typically be using a SIEM tool like Sentinel, so your systems are going to need to integrate and export data, which, as you can probably guess, isn’t very much fun. Like SSO and SAML, each SIEM solution is going to have different ingestion requirements. Currently there are at least 35 popular SIEM providers out there and many large enterprise admins have bespoke solutions too. You’re lucky if they only request to drop logs in an S3 bucket. We’ve even heard of IT admins wanting a streaming SSH tunnel of events.
Walkthrough example: Slack
To put all of this together, let’s take a look at a company that has successfully moved upmarket through building great enterprise-ready features: Slack. If you’re on the Enterprise Grid plan, Slack provides access to audit logs via a read-only REST API that covers pretty much any event that would be of interest to an IT admin or security team. Here are a few examples of the kinds of audit events that Slack retains:
- `workspace_created` – Emitted when a new workspace is created.
- `organization_created` – Emitted when a new organization is created.
- `ekm_enrolled` – Emitted when an organization enrolls in Slack’s EKM.
- `user_channel_join` - Emitted when a user joins a channel.
Each event comes with a semi-standard payload: an actor (the user who carried out the action), an action (the event type), an entity and a context. An example payload from their docs:
Slack lets developers interact with audit logs via a REST API, which makes integrating with SIEM providers just a bit easier. Their docs also cover how to install external apps that interact with the audit logs API.
Beyond the REST API, Slack also provides a basic frontend for interacting with audit logs. On smaller plans, you just see a list of access logs:
If you’re curious about how Slack started landing larger deals via enterprise features, check out our profile here – we cover the other features Slack built like SSO, EKM, and RBAC.