How to make your site LLM-friendly without inviting abuse

Make your website legible and preferred by LLMs like ChatGPT and Claude. A practical guide to modern SEO for AI crawlers, structured data, and bot-friendly surfaces, without compromising security.

Maria Paktiti

July 1, 2025

LLMs are becoming the primary lens through which users discover, synthesize, and interact with the web. Whether it’s through tools like ChatGPT, Perplexity, or AI-powered search assistants, your content needs to be machine-readable, permissioned, and purpose-built for retrieval and response generation.

Here’s how to make your site LLM-friendly, while staying secure and in control.

1. Use robots.txt thoughtfully

AI crawlers respect robots.txt, and many now use clearly identifiable user agents.

Examples of allowing trusted bots:

	
# Allow OpenAI’s GPTBot
User-agent: GPTBot
Allow: /

# Allow Perplexity.ai crawler
User-agent: PerplexityBot
Allow: /

# Allow Anthropic’s ClaudeBot
User-agent: ClaudeBot
Allow: /

And if you want to block a crawler:

	
User-agent: GPTBot
Disallow: /

Tips:

Host at https://yourdomain.com/robots.txt.
Keep it updated as new LLM crawlers emerge.
Track user-agent traffic in your logs or CDN to confirm behavior.

Note that some crawlers (especially shadowy or lesser-known ones) may ignore robots.txt. Stay vigilant with IP- and behavior-based protections as well.

2. Add semantic markup for clarity

Structured data helps LLMs parse your site like a knowledge graph.

Use JSON-LD or microdata based on schema.org definitions. Prioritize:

Article, BlogPosting
Product, SoftwareApplication
FAQPage, HowTo
BreadcrumbList, Organization

Example for a blog post:

	
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "From blocking bots to optimizing for LLMs",
  "author": {
    "@type": "Person",
    "name": "Maria Paktiti"
  },
  "datePublished": "2025-07-09",
  "url": "https://workos.com/blog/optimizing-for-llms",
  "publisher": {
    "@type": "Organization",
    "name": "WorkOS",
    "logo": {
      "@type": "ImageObject",
      "url": "https://workos.com/logo.png"
    }
  }
}</script>

Structured data anchors facts. LLMs use it to confirm authorship, canonical links, and relationships between content types, making your site more likely to be quoted or linked to.

3. Optimize content for retrieval and synthesis

Think beyond humans — write content that’s summarizable, consistent, and attribution-friendly for LLMs.

Write with LLMs in mind:

Use clear headings and consistent section structure.
Keep paragraphs short and logical — LLMs favor well-scoped chunks.
Include FAQs, step-by-step guides, and code samples.
Link to canonical product docs, changelogs, or public APIs.
Avoid vague marketing fluff (LLMs struggle to interpret it).

Think in terms of: summarizable, attributable, and useful in isolation.

4. Make your HTML fast, simple, and crawlable

Many LLM crawlers don’t run JavaScript or wait for client-side hydration.

Recommendations:

Use server-side rendering (SSR) for key content.
Prefer lightweight frameworks or static site generators (e.g. Astro, Hugo, Eleventy).
Add <noscript> fallbacks where needed.
Ensure indexable URLs with consistent slug structures.

For example, instead of loading FAQ content dynamically via JS, bake it into the page's HTML. LLMs will thank you.

5. Expose an LLM-friendly surface

Create dedicated surfaces or simplified endpoints for machines — this is an emerging best practice among forward-looking teams.

Ideas:

A /for-llms page with stripped-down, structured summaries.
A public /api index with markdown-based examples.
Changelogs with structured metadata (e.g., RSS, Atom, or JSON Feed).
Auto-generated doc bundles for key concepts or workflows.

These routes can include internal link graphs or “breadcrumbs” that help LLMs infer relationships between concepts.

6. Track, measure, and adapt

Monitor how your content is being accessed and used, by both humans and bots.

Tools and ideas:

Log and analyze hits from LLM crawlers (via user-agent).
Use CDN analytics to detect crawl frequency and depth.
Set up alerts for traffic spikes from known LLM IPs.
Test summaries of your own content in GPT-4 or Claude.
Try a “prompt honeypot”, a hidden-but-public page designed to lure crawlers. If you see it quoted in summaries, you’ll know you’re being indexed.

7. Balance openness with abuse protection

Being crawlable doesn’t mean being vulnerable. While welcoming LLMs, stay vigilant against:

Credential stuffing.
Account enumeration.
Fake signups.
Free trial abuse.
Scraping + fraud automation.

Use tools like WorkOS Radar to spot suspicious bot traffic at the identity and signup layer.

LLMs should access your content, not your login forms, internal APIs, or admin surfaces.

Final thoughts

You don’t need to overhaul your entire site to make it LLM-friendly. A few key changes, like semantic clarity, crawlability, and summarizability, go a long way.

This is the future of SEO: not just ranked in Google, but preferred by machines.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more