Your docs have a new audience
AI agents are reading your documentation. Here's what WorkOS did to serve them clean markdown instead of unparseable HTML.
AI agents are now a significant audience for your developer docs, and most sites serve them unparseable HTML. We rebuilt our docs pipeline at WorkOS to serve clean, dynamic markdown via content negotiation — and learned some hard lessons about silent failures along the way.
I know, I know: "optimizing content for AI agents" sounds like SEO for robots — keyword-stuffing your docs with filler words so Claude ranks you higher in some imaginary search results. But the first action an agent often takes is hitting a URL to get information. If that URL returns HTML, the agent spends tokens parsing it. If your site is a SPA, the agent might not get the content at all.
A developer using Claude Code to integrate your SDK isn't going to copy-paste docs into the chat. The agent is going to curl your URL, get back a wall of <div>s and <script> tags, and do its best to extract signal from noise.
Sometimes it works. Often it doesn't.
Agents have the same problems with content that people do. The words you use, the way you order them, and how much you say all matter. Just like you wouldn't serve a mobile user a desktop-only layout, you shouldn't serve an AI agent a page full of React components and navigation chrome.
If you hand agents a 50,000-token page when they needed a 2,000-token answer, you've just blown their context window for no reason.
So we started asking ourselves: what if we actually served agents something useful?
Serve markdown, not HTML
Static files are fine for many use cases, but they have a fundamental limitation: they're static. Our docs have dynamic elements — code samples that change based on your selected language, conditionally shown sections, API responses that plug in real project data. We needed something adaptive.
When a request comes in with an Accept: text/markdown header, we serve markdown instead of HTML. Our Next.js middleware detects the header, rewrites the request to an internal API route, and that route reads the MDX source, runs it through our serialization pipeline with stack-specific content matching, and returns clean markdown. No JavaScript, no navigation chrome. Just content.
Try it yourself:
Render what React hides
Once we started serving markdown, we hit a problem: our most useful content lives inside React components. Interactive provider tables, multi-language code samples — in the browser, they're great. To an agent, they're just <PipesProvidersTable /> — a self-closing JSX tag with no content in the MDX source.
The fix: instead of treating the .md content as a string, we parse the MDX into an AST, walk the tree, strip the node types agents don't need, and serialize back to plain markdown. For components that carry structured data (like tables), we wrote component-specific renderers that produce proper markdown equivalents. AST parsing became the baseline that made the whole pipeline reliable.
Handling agents in the real world
Remember earlier when I said we detect on Accept headers? Yeah, not every agent provides that. Not every agent sets a User-Agent string identifying themselves, either. Some AI agents just show up to our servers masquerading as clients making normal-looking requests.
To work around this, in addition to checking what the incoming request is Accepting, we added a little User-Agent detection as a fallback. We noticed, for example, that a lot of UA strings start with axios/. Axios is a popular HTTP client for Node, so we assumed that this was an agent asking for data—besides which, issuing a regular fetch call, just like a curl call, without the right Accept header, would return a JSX page. It seemed prudent to just give a client-side scraper the markdown content anyway:
Is this a perfect heuristic? No. Will it catch every agent? Definitely not. But it catches the common ones, and combined with the Acceptheader check, it covers most real-world cases.
The sneaky routing bug
We shipped it, it worked, everyone celebrated. Then we discovered certain URLs were returning the wrong markdown — content from a completely different page, served with a 200 status code.
Our markdown rewriting middleware was running before our redirect logic. When the markdown API couldn't find an exact file match for a path, it walked back segments until it found something. So /authkit/react/check-permissions silently became /authkit. The agent got the AuthKit overview, assumed it was the permissions docs, and generated code based on it.
One agent's response: "The WorkOS docs are frustratingly hard to scrape for this specific detail." Rude. But fair.
Reordering middleware so redirects resolve before markdown serving fixed it. The lesson stung, though: agents don't complain. They don't file support tickets. They quietly consume whatever you give them and try to make it work. Silent failures are the worst kind of failures.
LOL for LLMs.txt
Lots of folks have lots of opinions on LLMs.txt. It's a simple idea: generate a plain-text version of your entire site that agents can consume it without navigating links.
Are they useful? Are they a waste of time?
Here’s my opinion: someone is getting use of them. The effort to produce them is low, so we might as well just do it.
I know that’s not a terribly convincing argument. But, just like robots.txt, the system only works if we all agree to it. I can tell you that we get about 40 hits a day from OpenAI (because they have a descriptive User-Agent string!). If that means Codex users are getting the best information they can get from our docs, then by all means, it’s worth it to keep around. (Interestingly, we get about 60 hits a day from generic Mozilla/5.0... type strings. Either we have humans preferring a plain text version of our site, or some bot is trying to hide. WE SEE YOU.)
What we learned
After four PRs, a rewritten build pipeline, a new API route, an AST rendering pipeline, and one genuinely sneaky routing bug, here's what I keep coming back to:
Agents are your fastest-growing audience, and they're the easiest to disappoint. They can figure out your nav, but it’ll take time. They can't easily scroll around until they find what they need. They get one shot at the content you serve them, and if it's wrong or bloated or missing the parts that matter, the developer on the other end has a bad experience and blames your product and the agent.
Start with a corpus, then go dynamic. llms.txt is a decent starting point to prove you can turn your site into plain text, but content negotiation gives you the ability to serve stack-specific content, render data from live sources, and adapt as your docs evolve.
Your components are invisible to agents. If you have important data locked up in interactive components, you need a strategy for decomposing that content into markdown.
Test with curl. If something looks wrong to an agent, it’s easy to verify yourself with a quick curl -H 'Accept: text/plain' .
Find out how your users use agents. This is an obvious step, but often difficult to actually do. Fortunately, WorkOS creates a shared Slack channel with every customer, so we get direct feedback from them whenever issues arise. Our quick response times ensures that our customers tell us any time they see anything funky (thanks again, Sola!), so that any agent mishaps are resolved ASAP.
This is an evolving problem. Six months from now, agents will probably behave differently. The goal isn't to build the perfect agent-optimization system — it's building the awareness that your docs serve two audiences now, and both of them deserve a good experience.
The web has always been about serving the right content to the right client — mobile devices, screen readers, slow connections. Agents are just the latest client we need to care about. Making your docs work better for agents makes them work better for everyone. It turns out that when you strip away all the chrome and just serve the words, you find out pretty quickly whether those words are actually any good.