GEO & AEO

What is llms.txt and How to Generate It for Your Site

llms.txt is the emerging standard that tells AI crawlers what your site is about and how to cite it. Learn what it is, why it matters for GEO, and how to generate one in minutes.

May 8, 2026·7 min read·By Indexa

llms.txt is a plain-text file you place at the root of your website — at `/llms.txt` — that gives AI language models a structured, machine-readable summary of your site's content, purpose, and key pages.

Think of it as `robots.txt` for the AI era: instead of telling crawlers which pages to skip, it tells LLMs what your site is about and what content is most worth surfacing in AI-generated answers.

Why llms.txt matters for GEO

AI search engines — ChatGPT, Perplexity, Gemini, Bing Copilot — now answer questions directly rather than just linking to websites. To get cited, your site needs to be:

1. Crawlable by AI bots (GPTBot, PerplexityBot, ClaudeBot) 2. Understandable at a structural level, not just a page-by-page level 3. Explicitly positioned as an authority on specific topics

llms.txt addresses point three directly. It tells the model: "This site covers these topics. Here are the most important pages. Here is what we do and who we are."

Without it, an AI model must infer your site's purpose from content alone — which means competing with thousands of other pages for the same inference slot.

What goes inside llms.txt

A well-structured llms.txt file typically contains:

Site identity block Who you are, what you do, and who you serve. This is the "above the fold" summary for AI systems.

Key pages section Links to your most authoritative content — product pages, pillar articles, comparison pages, use-case pages. These are the pages you want AI systems to cite.

Topic coverage list Explicit enumeration of the topics your site covers authoritatively. This helps AI systems match your content to relevant queries without guessing.

Content freshness signal A note on how frequently your content is updated. AI systems favour sources that are current.

# Indexa — AI Blog Writing Platform
Indexa is an AI content platform for SEO and GEO. We generate, optimise, and auto-publish articles that rank on Google and get cited by ChatGPT, Perplexity, and Gemini.

What we cover

  • AI content generation for SEO
  • GEO (Generative Engine Optimisation)
  • AEO (Answer Engine Optimisation)
  • Multilingual content automation
  • CMS publishing integrations (WordPress, Webflow, Framer)

Key pages

  • /features — Full feature list
  • /pricing — Plans and pricing
  • /languages — 150+ supported languages
  • /blog — SEO and GEO guides

Contact

hello@indexa.so ```

How to generate llms.txt

Option 1: Write it manually (recommended for small sites)

If your site has fewer than 50 key pages, writing llms.txt manually takes 30 minutes and gives you full control over what AI systems see first.

Start with a one-paragraph summary of your site. Then list your 10-20 most important URLs with a one-line description of each. Add your topic coverage list. That's the core.

Option 2: Generate it from your sitemap

For larger sites, you can script the generation from your existing XML sitemap. Pull the URL list, filter to your most important pages (by traffic, by revenue importance, or manually), and generate descriptions using your existing meta descriptions.

Option 3: Use a CMS plugin

WordPress users can use SEO plugins that now include llms.txt generation. Verify the output — many plugins generate generic files that miss the strategic positioning value.

Where to place it

Place the file at `/llms.txt` — at the root of your domain. Not in a subdirectory. Not on a subdomain. The root path is the emerging standard.

Also place a companion file at `/llms-full.txt` if you want to include a richer, more detailed version for AI systems that support extended context.

llms.txt and robots.txt: the difference

robots.txt controls access — it tells crawlers which pages to skip.

llms.txt controls positioning — it tells AI systems what your site means and what to cite.

Both files are needed. robots.txt without llms.txt means AI systems can crawl your content but have no structured signal for how to position you. llms.txt without a permissive robots.txt means AI crawlers can read your strategic summary but can't verify it against your actual content.

Verifying AI crawlers can access your llms.txt

After publishing your llms.txt, confirm the key AI crawlers are not blocked:

  • GPTBot (OpenAI/ChatGPT)
  • PerplexityBot
  • ClaudeBot (Anthropic)
  • Google-Extended (Google AI features)
  • Amazonbot

Check your robots.txt to confirm these user agents are explicitly allowed (or not blocked by a wildcard Disallow rule).

*Indexa automatically generates and maintains your llms.txt as part of the GEO optimization layer — updated every time you publish new content.*

Generate content like this — automatically

Research, write, optimize for GEO, and publish. Start your free trial.

Start free trial