In this article

October 30, 2025

Beyond the Hype: What Actually Works for Production AI Systems

Insights from the Developer Experience and AI panel at Enterprise Ready Conference 2025 on why good DX matters more than ever, what makes production AI systems successful, and how to raise the bar on API design, documentation, and team practices.

Zack Proser

October 30, 2025

This post is part of our ERC 2025 Recap series. Read our full recap post here.

The developer tools world is wrestling with a provocative question: Is developer experience dead? With AI agents supposedly capable of navigating any API, some argue that careful API design and documentation no longer matter. The agents will just figure it out.

At Enterprise Ready Conference 2025, a panel of Dustin Schau, Head of API Client at Postman, Meagan Gamache, VP of Product at Render, and James Cowling, Co-Founder and CTO of Convex, moderated by Garrett Galow of WorkOS, delivered a unified response: This is dangerously wrong.

In fact, the bar for developer experience is rising, not falling—and the companies building successful production AI systems are the ones who understand why.

The Conceptual Clarity Problem

The panel pushed back hard against the notion that "agents can just use crappy APIs." The reality emerging from production systems tells a different story. AI systems are semantic engines that rely on semantic accuracy to make good decisions. When APIs use internal company jargon or conceptually unclear abstractions, agents struggle just as much as humans do—sometimes more.

One panelist noted that many SaaS APIs evolved using language that was really internal to the company, creating conceptual confusion that agents can't simply power through. The solution isn't to dumb things down or add more documentation prose. It's to invest in genuine conceptual clarity: APIs that make it simple to manage complicated things, with abstractions that are intuitive for both humans and machines.

This matters because the companies successfully deploying AI agents in production aren't the ones with the most advanced models. They're the ones with the cleanest primitives.

What Actually Works: The Fuzzy and the Concrete

A clear pattern emerged from panelists' experiences with customers building production AI systems: successful applications combine what LLMs do well (fuzzy, human-style problems like reading documentation and generating text) with what platforms do well (concrete problems like databases, scheduling, and transactions).

The magic word that kept appearing: workflows. Successful agent systems are built on workflow primitives—durable scheduling, checking system status, aggregating results from third parties, handling transactions. These aren't new problems, but they're the foundation that makes AI agents useful rather than just interesting.

One example stood out: a company building a smart spreadsheet where each cell kicks off an agentic workflow. The use case is narrow and specific—take a fuzzy operation like "look up information about these people and populate the next column"—and execute it against concrete workflow primitives. It's a task that would take a human an hour of tedious work, but can be automated to a high-quality bar because the scope is clear.

The systems that struggle are the ones that try to have the LLM do everything, including the platform-style tasks that traditional systems handle better. The platforms that succeed are the ones that make it easy to compose these two approaches.

Documentation That Actually Works for AI

If good APIs matter more than ever, so does good documentation—but not in the way many teams assume. The panel highlighted a critical shift: documentation now needs to be written for information density rather than page count.

When developers consume docs through chat interfaces or LLM-powered search, padding and fluff become actively harmful. If you put two sentences into an LLM that generates two pages of documentation, and that feeds back into another LLM, you're only conveying two sentences worth of useful information. The high school essay approach of "how do I get to 10 pages" is dead.

The new standard: documentation should convey several key points with maximum clarity and minimum filler. Several panelists mentioned using LLMs to evaluate their own docs—asking models questions and seeing whether they can extract the right information. It's like running evals on your documentation: can an agent that reads this actually understand how to use your product?

This also revealed a frustrating problem with cutting-edge development: when working on bleeding-edge features, the documentation might not even be in the training data yet. This is driving a new category of work called Generative Engine Optimization (GEO)—the AI equivalent of SEO, focused on making sure LLMs can discover and understand your product.

The MCP Reality Check

Model Context Protocol (MCP) came up as both an opportunity and a cautionary tale. While the technology enables powerful integrations, the panel was blunt about the risks of implementing it without discipline.

One panelist posed a thought experiment: if an intern asked for a button on their desk that could drop the production database—promising they'd only press it if really necessary—you'd obviously say no. Yet companies regularly ask to expose equally dangerous operations to LLMs through MCP.

The problem goes beyond security. Many MCP implementations are simply bad developer experiences—doing one-to-one mappings of internal APIs without thinking through actual use cases. The successful approach requires thinking in terms of user workflows and providing the right level of abstraction, not just exposing every possible operation.

This reflects a broader principle: systems need to encode guardrails. They should make it easy to do the right thing and very hard to do the wrong thing. As one panelist put it, "You can't just do a one-to-one mapping of all your APIs in the company. You have to think in terms of use cases."

The Team Impact: Accountability and Active Learning

The conversation shifted to how AI is changing internal team dynamics, and this is where the perspectives became more nuanced. Some panelists see AI as a way for people to uplevel in skills they don't know well—product managers writing better specs with AI assistance, engineers contributing to areas outside their core expertise.

But others raised concerns about what gets lost in the process. AI is excellent for passive learning (reading documentation, looking at examples, asking questions), but active learning—solving problems through hardship with no guidance and developing intuition—is getting harder. Shortcuts inhibit active learning, and the path from junior to senior engineer requires blood, sweat, and tears that you can't simply skip.

One panelist was direct about this in code reviews: never write "sorry, Claude wrote that" in a pull request. You are the author. You are accountable. Use whatever tools you want, but you're responsible for the work you produce. This accountability framework makes it easier to compose and scale AI usage within organizations without losing quality or learning opportunities.

The challenge is maintaining both sides of this equation: using AI to amplify expertise while ensuring people still develop that expertise in the first place.

Practical Advice for Getting Started

When asked how companies should actually start with AI, the advice was remarkably consistent: find something in your workflow that takes time and try to do it with AI. Don't aim for the moonshot first. Build familiarity with the tools by solving real problems you face daily.

Several panelists emphasized experimentation over planning. The companies making progress aren't the ones with elaborate AI strategies—they're the ones building narrow, specific, clear applications that solve concrete problems. Internal tools often make good starting points because you can iterate quickly and the stakes are lower.

One example: using an agent to query GitHub issues and summarize patterns for product planning. It's not revolutionary, but it saves hours each week and provides a foundation for understanding what these tools can actually do.

The other critical piece of advice: focus on what AI is genuinely good at rather than forcing it to do everything. The MIT study showing that 95% of agent projects don't deliver on their goals keeps coming up for a reason. Many "agents" are really just workflows that don't need AI at all—they need good understanding of the problem, the context, and the data.

The Real Standards Going Forward

Perhaps the most important takeaway from this panel is that building for AI doesn't mean lowering your standards. It means raising them.

Good APIs matter more because both humans and agents need conceptual clarity. Documentation matters more because it needs maximum information density. Security and guardrails matter more because the consequences of mistakes can be automated at scale. Active learning and accountability matter more because shortcuts make it easier to ship work you don't truly understand.

The companies that will succeed in this transition aren't the ones chasing every new AI feature or betting that agents will solve their design problems for them. They're the ones doing the hard work of building clean primitives, writing clear documentation, and maintaining discipline about what should and shouldn't be exposed to automated systems.

As the industry moves past the initial AI hype cycle, these fundamentals are what separate production systems from impressive demos. The bar is rising. The question is whether you're ready to meet it.

Want to learn more about building enterprise-ready applications? Check out WorkOS for enterprise features that work out of the box, or explore more insights from Enterprise Ready Conference 2025.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more