Genesis
Large language models face a critical architectural limitation: their context windows are too small to process entire websites. Converting complex websites containing navigation, ads, JavaScript and CSS into LLM-friendly clean text is both difficult and imprecise.
Jeremy Howard, co-founder of Fast.ai and lecturer at the Universities of Queensland and Stanford, observed that most HTML on sites consisted of menus, tracking scripts, repetitive sections and ads - elements that eat up valuable tokens in the LLM context window without contributing substantive value. Inspired by the simplicity of robots.txt, Howard created a standard that allows site owners to provide LLM with structured, expert knowledge in a single, accessible location.
Technical specifications
The llms.txt standard defines a precise Markdown-based structure that combines human readability with programmatic parsing capabilities. The file must be in the root path of the /llms.txt sites and include the following sections in a specific order:
Required components:
- H1 header - name of the project or site (the only mandatory section)
Optional but recommended items:
- Blockquote - a concise summary of the project containing the key information necessary to understand the rest of the file
- Descriptive sections - zero or more markdown sections (paragraphs, lists) without headers, containing detailed information about the project and how to interpret the supplied files
- H2 sections with file lists - zero or more sections separated by H2 headers, containing lists of URLs with additional details
- Optional" section - section with special meaning. The URLs contained therein may be omitted when a shorter context is required
Link Specifications:
Each file list must contain the required hyperlink markdown name(url) , followed optionally by : and file notes.
The .md extension
The proposal also implies that sites containing information useful to LLM should provide a clean version of markdown under the same URL with added .md (or index.html.md for URLs without file names).
Example to implement
Below is a professional llms.txt template ready for customization and implementation:
> A concise description of your business, one or two sentences explaining the specialization, offering, or purpose of the project. This section helps LLMs understand the context of the remaining resources.
Key contextual information:
- First important note about the nature of the business or technology
- Second note specifying scope or limitations
- Third note clarifying the target audience
## Main Resources
- [Home Page](https://example.com): Introduction and latest announcements
- [API Documentation](https://example.com/api): Complete technical documentation with examples
- [Quick Start Guide](https://example.com/quickstart): Step by step onboarding for new users
- [Best Practices](https://example.com/best-practices): Proven patterns and recommendations
## FAQ and Support
- [Frequently Asked Questions](https://example.com/faq): Answers to the most common user questions
- [Troubleshooting](https://example.com/troubleshooting): Diagnostic guide for common issues
- [Contact](https://example.com/contact): Contact form and company details
## Developer Resources
- [API Reference](https://example.com/api-reference): Full endpoint documentation
- [Code Examples](https://example.com/code-examples): Practical implementations and case studies
- [Changelog](https://example.com/changelog): History of changes and updates
## Optional
- [Company History](https://example.com/history): Project evolution and milestones
- [Blog Archive](https://example.com/blog-archive): Older blog posts
- [Privacy Policy](https://example.com/privacy): Detailed information on data protectionComprehensive guidelines can be found at: https://llmstxt.org
Important guidelines:
- File size should be limited to ~100 KB for optimal performance
- Encoding: UTF-8
- Format: pure Markdown without HTML
- All URLs should be absolute (absolute e.g. https://example.com/url), not relative
- Links must lead to active resources (avoid 404 errors)
Optional llms-full.txt
The standard also provides for an optional file llms-full.txt , which contains full, developed documentation in a single file. While llms.txt acts as a table of contents, llms-full.txt provides the complete content of all linked documents, allowing AI systems to access the entire knowledge base in a single request.
For a list of llms.txt and llms-full.txt inspirations and maps, visit: https://llmstxt.site
Validation and testing
Check:
- Location and reachability (HTTP 200, no redirect loops)
- HTTP headers (Content-Type: text/plain or text/markdown; charset UTF-8)
- Content-Length and GZIP/Brotli compression
- Content freshness and canonical URLs
- Licensing and AI Relations attributes, or how AI models can use your content
- Correctness of Markdown syntax, structure (H1, H2, correctness of links), for example, on the page: https://markdownlivepreview.com
Manual accessibility testing:
# Check: HTTP/1.1 200 OK
# Content-Type: text/plain; charset=utf-8
# Content-Length: [size]
curl https://yourwebsite.com/llms.txt
# Verify content and formattingTests with LLM
- Provide URL to llms.txt to ChatGPT, Claude, Gemini models
- Ask questions about key information from your site
- Verify that AI correctly uses the indicated resources and links
Log monitoring and traffic analysis in GA4:
Watch for traffic from user-agents:
- GPTBot (OpenAI)
- Claude-Web (Anthropic)
- GoogleOther (Google AI)
- PerplexityBot (Perplexity)
- Other AI bots
The increase in visits to these bots after the implementation of llms.txt is an indicator of effectiveness. Research Insightland demonstrated 600% increase in GPTBot visits after llms.txt implementation.
10 most common mistakes
Error 1: Incorrect file location
The problem: File placed in a subdirectory instead of the root directory
Solution: Always place the file exactly under https://yourwebsite.com/llms.txt , not in /seo/llms.txt neither /ai/llms.txt .
Error 2: The required H1 header is missing
The problem: Starting a file without a level 1 header
Solution: The first line must include # Project Name.
Error 3: Incorrect encoding
The problem: A file saved in an encoding other than UTF-8.
solution: Save the file with explicit UTF-8 encoding. In most editors: File → Save with Encoding → UTF-8.
Error 4: Size limit exceeded
The problem: File larger than 100 KB
Solution: Limit content to the most important resources. Use llms-full.txt for complete documentation.
Error 5: Incorrect Markdown link syntax
The problem: Title (https://url) instead of Title(https://url) (space before parentheses)
Solution: Make sure there are no spaces between ] a ( .
Error 6: No blockquote with description
The problem: Omitting the contextual description of the project.
solution: Add > Short description after the H1 header for better understanding by LLM.
Error 7: Dead links and 404 errors
The problem: Links leading to non-existent resources.
Solution: Regularly test all URLs with tools like broken link checker.
Mistake 8: An excess of irrelevant content
The problem: List of all subpages without prioritization.
solution: Select 5-15 most important resources. Quality > quantity.
Error 9: Not using the "Optional" section
The problem: All resources on an equal level of importance
Solution: Place secondary resources in the section ## Optional , so that they can be overlooked with limited context.
Error 10: No post-deployment verification and no updates
The problem: Assumption that the file works without testing, no content updates.
Solution: Conduct validation and accessibility testing after each change. Add update information to the map -> Last update
Does it work?
According to the data BuiltWith october 2025, 844,473 sites has implemented the llms.txt standard. SE Ranking's analysis of 300,000 domains showed an adoption rate of 10.13%, with the majority of deployments coming from the developer tools sector, technical documentation platforms and technology companies, where AI coding assistants are critical to the business.
The llms.txt standard has been adopted by leading technology companies Anthropic (Claude documentation), Cloudflare, Stripe, Perplexity, Cursor, Solana, ElevenLabs, Hugging Face, Raycast, Yoast, DataForSEO, Zapier, Mintlify.
Case Study 1: Insightland
Results:
- Increase in GPTBot visits by 600% (from a few hundred to almost 2,000 visits)
- Perplexity-User appeared 7 times in 3 days
- TikTok, Moz, Amazon, Petalbot, Bytedance and Bing bots visited llms.txt file
- Increase in overall site traffic
- No negative impact on traditional SEO rankings
Case Study 2: WordLift
Results:
- Increase in organic traffic by ~25% after llms.txt implementation
- Better indexing by AI, richer visibility in knowledge panels and snippets
Case Study 3: Mintlify
Results:
- Reduction in time for LLM to process documentation by 40%
- Improving the accuracy of AI responses by 30%
- Thousands of technical documentation sites automatically received llms.txt files
Case Study 4: Cloudsential
Results:
- Significant increase in visibility in AI
- Cloudsential emerges as top source for ChatGPT SEO-related queries
Evidence of GEO's effectiveness
Generative Engine Optimization (GEO) study conducted by the research team showed that using content optimization strategies for generative engines can increase source visibility by up to 40% in responses generated by AI.
The most effective GEO methods:
- Adding quotes - increase in visibility by more than 40%
- Adding statistics - increase in visibility by more than 30%
- Optimizing content fluidity - significant increase
- Citing sources - significant improvement
- Technical terminology - moderate improvement
The study conducted a systematic evaluation on a benchmark GEO-BENCH consisting of 10,000 diverse queries from multiple domains.
Integration with GEO/AEO ecosystem
The llms.txt standard is a fundamental component of a broader Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO) strategy. Here are the highlights of the full, holistic approach:
1. SEO is no longer a "must have", bet on GEO/AEO
Traditional SEO remains crucial, but by itself does not guarantee visibility in LLM models. A growing body of analysis shows that even brands with well-developed SEO do not always appear in the models' responses. At the same time, there are examples of brands with virtually no SEO efforts, yet they are cited by LLM, indicating that visibility in AI depends on factors other than standard search engine authority.
Available research on brand visibility in the responses of large language models shows that even global brands can remain invisible despite ranking well in traditional SEO. Our own visibility tests in Google and in LLM indicate that some brands virtually do not appear in the results for key category phrases in Google, while LLM models continue to cite them. This suggests that these brands do not have consistent SEO efforts, which limits their visibility in search engines, but does not affect their presence in LLM responses to the same extent.
Tomasz Cincio - CEO of Semly.ai
2. Structural data (Schema.org)
Schema markup implementation for FAQ, Article, Product and other types of content increases the likelihood of citation. Pages with complete structured data are significantly more likely to be cited by AI
3. AI-friendly content architecture
- Front-loading: Key information at the beginning of the content
- Hierarchical structure: Clear H1-H6 headings
- Letters and bullet points: Increase extractability through AI - that is, the ability of a language model to extract, recall or reproduce data
- Short paragraphs: <25 words per sentence, <100 words per paragraph
See how AI model bots see your site, instead of https://semly.ai enter your address: https://r.jina.ai/https://semly.ai
4. Authority and content
- External citations: Mentions on authoritative third parties
- Domain authority: Overall industry visibility
- Freshness of content: Pages updated in the last 12 months are 2x more likely to get citations
5. Brand Visibility Score metrics
Design: (Responses mentioning your brand ÷ Total number of responses) × 100
Supporting metrics:
- Citation Rate: % of LLM responses mentioning or linking to your brand
- Sentiment Score: (Positive + 0.5 × Neutral mentions) ÷ All mentions
- Share of Voice: % of total citations compared to competitors
AI visibility monitoring tools
The market for AI visibility monitoring tools is growing rapidly, and companies are looking for ways to understand how ChatGPT, Gemini, Perplexity or other models present their brand or products. The following overview compares Semly, Profound and Searchable. Unlike its competitors, Semly not only measures visibility in AI, but is the only tool in this comparison actively creates correct product data under LLM and data aggregators, which realistically increases the chance of brands appearing in AI recommendations.
| Criterion | Semly (semly.ai) | Profound (tryprofound.com) | Searchable (searchable.com) |
|---|---|---|---|
| Overarching purpose of the tool | GEO for e-commerce, services and brands - increasing visibility in LLM responses and opening a new sales channel in AI search. | Enterprise AI visibility: monitoring how brands appear in response generative engines and answer engines, reports for large teams. | Advanced toolkit for AI search: visibility analytics, content, audits technical and AEO, combined with data from GA4 and GSC. |
| Role vis-à-vis LLM and data | Actively creates and standardizes data under LLM: builds structured product feeds for stores prepared under the indexation by data aggregators used by LLM (ChatGPT, Gemini and others). Semly doesn't just measure visibility, but provides the very data the models are supposed to read. | Mainly monitoring and visibility analytics: Profound Analytics, how existing brand content is cited by AI, where they get their data from and how share of voice is changing. Does not create new product feeds under LLM, only works on existing data. | Mainly tracking and visibility optimization: Searchable links data about visibility in AI with traffic analytics, audits content and on-page. It does not act as a feed manager for LLM, rather as an analysis and optimization tool. |
| Focus on e-commerce | Yes, e-commerce first: product designed for stores, services, brands and manufacturers who want to sell through AI. | Rather, a horizontal enterprise tool for brands across multiple industries (SaaS, retail, finance, etc.). | Horizontal AEO toolkit: supports e-commerce, but is not exclusive for stores, targets a broad marketing-SEO market. |
| AI visibility feature type | Visibility and sales: checks whether the store's products and offerings can be recommended by LLM and how to improve the data to increase the chance of to appear in purchase responses. | Answer engine insights: tracking brand citations, sources where AI finds information, and shares in AI search results for selected prompts. | Dashboard AI search: visibility in ChatGPT, Claude, Perplexity, etc..., combined with traffic analysis from GA4 and GSC, AEO and on-page SEO audits. |
| Data input | Product feed (e.g. Google Shopping XML) and data scraping for brands. Semly maps and processes data into a form, which data aggregators and LLMs can use effectively. | Prompt sets, keywords, domain, markets and competitors. The input is mainly queries to AI and service addresses. | Domains, keywords, campaigns, integrations with GA4, GSC and CMS (e.g., Webflow, Shopify, WordPress) to combine visibility with traffic. |
| Supported AI engines (high level) | ChatGPT, Gemini and other popular LLMs and AI surfaces used to search for services and products (AI shopping, recommendations). | ChatGPT, Perplexity, Google AI Overviews / AI Mode, Grok, Meta AI and other answer engines, especially on a large enterprise scale. | ChatGPT, Claude, Perplexity, Google AI, Copilot and classic search engines, bundled into a single visibility view. |
| Entry price | From about €24 per month for the Mini plan for small brands and stores (a simple subscription service for brands). | Enterprise custom pricing: no specific rates on the site, pricing after commercial contact. External reviews indicate typical plans of about $399 per month up, with a limited starting plan of about $99 per month. | Paid plans with no overt rates on the site: start with 7-day free Pro trial period, further prices visible only after going to "See all plans" or contact with the sales department. Positioned as a solution premium type for marketing teams. |
| Cost level vs Semly | Entry-level for brands: cost comparable to one simple saaS subscription or a trip to the movies a month. | Significantly higher: typically a multiple of Semly's cost per scale month, designed for enterprise budgets (marketing, PR, SEO). | Between Semly and Profound, closer to the tools segment premium marketing and analytical services aimed at teams and agencies, rather than individual brands. |
| Best use case | An online store or brand wants its products or services to be realistically available and recommended by ChatGPT, Gemini and other LLMs, and that the data is properly shared by data aggregators. | A global enterprise brand wants to measure how AI represents its brand, where does AI get its data from, what does share of voice and reputation look like in AI. | Marketing team or agency wants to combine visibility in AI search with traffic analytics, content audits and content creation process in one tool. |
Both Profound, as well as Searchable are advanced analytical tools, but they focus on monitoring brand visibility and reputation. Semly works differently: it combines visibility monitoring with the function of creating data under LLM, so it influences what models can see and use. At the same time, the cost of entry for Semly is many times lower than for foreign enterprise platforms. As a result, Semly acts as the first real GEO tool designed for e-commerce and brands that not only reports, but actively increases the chance of sales in new AI channels.
The future of the standard
The llms.txt standard, while experimental, is evolving toward wider adoption. Google has included llms.txt in its Agents to Agents (A2A) protocol, signaling at least experimental interest. Mintlify in November 2024 enabled the automatic generation of llms.txt for every documentation site they host, instantly adding thousands of technical documentations to the ecosystem.
Jeremy Howard in March 2025, said the vision extends beyond the current reality - an AI-first web standard where language models no longer waste tokens on redundant HTML, but can focus on relevant knowledge.
Summary
The llms.txt file represents a fundamental change in the way web content is made available to AI systems. The standard, although experimental, has achieved mass adoption (over 844,000 sites) and is producing measurable results - a 20% to 40% increase in visibility in AI responses, a 600% increase in AI bot visits and a 30% improvement in response accuracy.
Key findings:
Implementation is simple, but requires precision: Markdown structure, UTF-8 encoding, location in the root directory and correct section hierarchy are key to effectiveness.
Validation is mandatory: Use ChatGPT, for example, to validate the map before publishing.
Avoid the 10 most common mistakes: Incorrect placement, missing H1, bad coding, exceeding size limit, incorrect link syntax, missing blockquote, dead links, excess content, ignoring Optional section and lack of testing.
Integrate with GEO/AEO: llms.txt is part of a broader strategy that includes SEO, structured data, AI-friendly content architecture and brand authority building.
Monitor effectiveness: Use tools like Semly.ai to track Brand Visibility Score, Citation Rate and Share of Voice.
Update regularly: Freshness of content is critical - pages updated in the last 12 months are 2x more likely to be cited.
In an era where AI is evolving into the dominant information discovery interface, controlling how language models interpret and present your brand becomes a strategic imperative. The llms.txt standard, supported by empirical evidence and growing adoption, represents a fundamental step toward an AI-first web.
FAQ - Frequently Asked Questions
Is llms.txt an official standard?
No, llms.txt is a proposed standard created by Jeremy Howard. No major LLM provider has officially confirmed that it reads these files, but empirical evidence (increase in AI bot visits, case studies) suggests that the standard is being used in practice.
Does the implementation of llms.txt guarantee citation by AI?
No, llms.txt does not guarantee citations. However, it does increase the likelihood and relevance of citations by making it easier for AI to access key content. Studies show a 20-40% increase in visibility after implementation.
Does llms.txt replace robots.txt or sitemap.xml?
No. Each of these files has a different purpose:
- robots.txt - controlling access of indexing bots
- sitemap.xml - list of all indexable pages for search engines
- llms.txt - a curated map of key resources for AI
How often should I update llms.txt?
At least quarterly or after any significant change in site structure, addition of key content or rebranding. Content that has not been updated for more than 12 months is 2x less likely to be cited by AI.
Can I have multiple llms.txt files for different sections of the site?
Yes, the specification allows files in subpaths, such as. https://docs.example.com/llms.txt for the documentation section. Always keep the main file in the domain root directory.
What is the optimal size of llms.txt file?
The recommended limit is ~100 KB. Larger files may overload LLM context windows. For extensive documentation, use llms-full.txt as a supplement.
Does llms.txt affect traditional SEO?
Studies have shown no negative impact on SEO rankings. The file is neutral to traditional search engines and can indirectly support SEO by improving brand visibility in AI, which generates traffic to the site.
How to measure the effectiveness of llms.txt?
Monitor:
- Logs and bot traffic in GA4 (increase in AI bot visits)
- Tools like Semly.ai will show you your brand's visibility in AI
- Brand Visibility Score and Share of Voice
- Traffic from AI search engines in Google Analytics
Should small businesses implement llms.txt?
Yes, if you care about visibility in the AI ecosystem. Implementation is simple (1-4 hours), inexpensive, and can yield significant benefits with minimal risk.
What if I don't have the resources to create .md versions for all sites?
Focus on the most important 5-10 resources. Quality and prioritization are more important than completeness. You can link directly to HTML, although Markdown is preferred.
Glossary
LLM (Large Language Model) - a large AI language model capable of understanding and generating text from massive training data sets
Markdown - a lightweight markup language for formatting text, characterized by simplicity and readability
Context Window - limit of tokens (units of text) that LLM can process in a single query
GEO (Generative Engine Optimization) - the process of optimizing content to increase the chances of appearing in AI-generated responses
AEO (Answer Engine Optimization) - gEO synonyms; optimization for AI response engines
Parsing - the process of analyzing the structure of data by a computer program
User-Agent - bot or browser ID in HTTP headers
Schema.org - common structured data dictionary for websites
Brand Visibility Score - a metric that measures the frequency of brand mentions in AI responses
Sources
llmstxt.org - official specification of the standard
answer.AI (Jeremy Howard) - proposal and justification of the standard
llmstxt.site - index of websites that have already implemented llms.txt or full-llms.txt map
r.jina.ai/https://semly.ai - check how AI bots see your site
Research: GEO - academic research (40% increase in visibility)
Aggarwal P. et al, "GEO: Generative Engine Optimization," KDD '24, 2024 - novel research and framework for optimizing content visibility under AI generative systems.
Share:
