In this article
January 14, 2026
January 14, 2026

Baseten is betting big on open source models

A conversation with Philip Kiely from Baseten at AWS re:Invent 2025.

Philip Kiely from Baseten has a simple pitch: "We do three things. We get the GPU, we put the model on the GPU, and we make it run really fast." Each of those things, he admits, takes years of effort and massive engineering teams.

Baseten is an inference infrastructure company powering AI workloads for companies like Cursor, Notion, Clay, and healthcare AI companies like Open Evidence and Ambience. If you've used AI-powered code generation or healthcare tools recently, there's a good chance Baseten was running the models underneath.

The open source shift

The most significant trend Philip sees is the rise of open source models. The gap between closed and open source models, maybe two or three months, sounds significant in AI time, but it's a blink of an eye for most industries.

More importantly, the absolute capabilities of open source models keep crossing invisible thresholds. Three years ago, you couldn't build an AI-powered customer support chat or coding assistant. Closed source models unlocked those capabilities first, but open source models followed shortly behind. And when they did, companies could switch to models that were faster, less expensive, more customizable, and more reliable at scale.

Philip draws an analogy to operating systems. IBM had its era, then Sun and Solaris and Windows, but eventually Unix and BSD and Linux emerged. Today, Linux runs on more machines than anything else. Even macOS is built on BSD. He sees the same pattern emerging with AI models, foundation models from the open source world, with companies building delightful, customized experiences on top.

The international research collaboration

The best open source models are increasingly coming from China: Qwen, Kimmy, DeepSeek, GLM from Zhipu. Philip sees this as genuine international collaboration through open source. Mistral learns from DeepSeek's attention mechanisms, then DeepSeek builds on that work in their next release.

There are geopolitical complications. Export restrictions mean Chinese labs often target Hopper GPUs instead of Blackwell, which creates work for infrastructure providers like Baseten who need to port optimizations to newer architectures. But the research itself flows across borders.

Where the industry is heading

Baseten just launched a startup program for seed and Series A companies, but there's a structural shift that happens as AI companies scale. In the early days, paying per token and building on foundation models makes sense—the priority is finding product-market fit, not optimizing infrastructure costs.

But eventually, companies start spending $10,000, $20,000, $50,000 a month on inference. That's when they start thinking about bringing workloads in-house, controlling their own destiny, and building differentiated advantages at the model level.

Voice is emerging as a primary modality for Baseten—both voice-in (transcription) and voice-out (generation). Companies like Rhyme are building their own foundation models for voice, powering enterprise call centers. The use cases span from practical tools like Whisper (the dictation app) to World Labs' experimental text-to-3D world generation.

That World Labs demo, Philip admits, was the last thing that truly surprised him in AI. Despite being early and rough around the edges, it showed a future where immersive worlds could be created in real-time for education, gaming, and storytelling. After four years in this industry, feeling jaded, that demo made him say "Oh, wow."

This interview was conducted at AWS re:Invent 2025.

This site uses cookies to improve your experience. Please accept the use of cookies on this site. You can review our cookie policy here and our privacy policy here. If you choose to refuse, functionality of this site will be limited.