Map lllms.txt how it works and why it affects visibility in AI

Check your visibility in AI

Perform a free audit

Genesis

Large language models face a critical architectural limitation: their context windows are too small to process entire websites. Converting complex websites containing navigation, ads, JavaScript and CSS into LLM-friendly clean text is both difficult and imprecise.

Jeremy Howard, co-founder of Fast.ai and lecturer at the Universities of Queensland and Stanford, observed that most HTML on sites consisted of menus, tracking scripts, repetitive sections and ads - elements that eat up valuable tokens in the LLM context window without contributing substantive value. Inspired by the simplicity of robots.txt, Howard created a standard that allows site owners to provide LLM with structured, expert knowledge in a single, accessible location.

Technical specifications

The llms.txt standard defines a precise Markdown-based structure that combines human readability with programmatic parsing capabilities. The file must be in the root path of the /llms.txt sites and include the following sections in a specific order:

Required components:

H1 header - name of the project or site (the only mandatory section)

Optional but recommended items:

Blockquote - a concise summary of the project containing the key information necessary to understand the rest of the file
Descriptive sections - zero or more markdown sections (paragraphs, lists) without headers, containing detailed information about the project and how to interpret the supplied files
H2 sections with file lists - zero or more sections separated by H2 headers, containing lists of URLs with additional details
Optional" section - section with special meaning. The URLs contained therein may be omitted when a shorter context is required

Link Specifications:

Each file list must contain the required hyperlink markdown name(url) , followed optionally by : and file notes.

The .md extension 

The proposal also implies that sites containing information useful to LLM should provide a clean version of markdown under the same URL with added .md (or index.html.md for URLs without file names).

Example to implement

Below is a professional llms.txt template ready for customization and implementation:

> A concise description of your business, one or two sentences explaining the specialization, offering, or purpose of the project. This section helps LLMs understand the context of the remaining resources.
Key contextual information:
- First important note about the nature of the business or technology
- Second note specifying scope or limitations
- Third note clarifying the target audience
## Main Resources
- [Home Page](https://example.com): Introduction and latest announcements
- [API Documentation](https://example.com/api): Complete technical documentation with examples
- [Quick Start Guide](https://example.com/quickstart): Step by step onboarding for new users
- [Best Practices](https://example.com/best-practices): Proven patterns and recommendations
## FAQ and Support
- [Frequently Asked Questions](https://example.com/faq): Answers to the most common user questions
- [Troubleshooting](https://example.com/troubleshooting): Diagnostic guide for common issues
- [Contact](https://example.com/contact): Contact form and company details
## Developer Resources
- [API Reference](https://example.com/api-reference): Full endpoint documentation
- [Code Examples](https://example.com/code-examples): Practical implementations and case studies
- [Changelog](https://example.com/changelog): History of changes and updates
## Optional
- [Company History](https://example.com/history): Project evolution and milestones
- [Blog Archive](https://example.com/blog-archive): Older blog posts
- [Privacy Policy](https://example.com/privacy): Detailed information on data protection

Comprehensive guidelines can be found at: https://llmstxt.org

Important guidelines:

File size should be limited to ~100 KB for optimal performance
Encoding: UTF-8
Format: pure Markdown without HTML
All URLs should be absolute (absolute e.g. https://example.com/url), not relative
Links must lead to active resources (avoid 404 errors)

Optional llms-full.txt

The standard also provides for an optional file llms-full.txt , which contains full, developed documentation in a single file. While llms.txt acts as a table of contents, llms-full.txt provides the complete content of all linked documents, allowing AI systems to access the entire knowledge base in a single request.

For a list of llms.txt and llms-full.txt inspirations and maps, visit: https://llmstxt.site

Validation and testing

Check:

Location and reachability (HTTP 200, no redirect loops)
HTTP headers (Content-Type: text/plain or text/markdown; charset UTF-8)
Content-Length and GZIP/Brotli compression
Content freshness and canonical URLs
Licensing and AI Relations attributes, or how AI models can use your content
Correctness of Markdown syntax, structure (H1, H2, correctness of links), for example, on the page: https://markdownlivepreview.com

Manual accessibility testing:

# Check: HTTP/1.1 200 OK
# Content-Type: text/plain; charset=utf-8
# Content-Length: [size]
curl https://yourwebsite.com/llms.txt
# Verify content and formatting

Tests with LLM

Provide URL to llms.txt to ChatGPT, Claude, Gemini models
Ask questions about key information from your site
Verify that AI correctly uses the indicated resources and links

Log monitoring and traffic analysis in GA4:

Watch for traffic from user-agents:

GPTBot (OpenAI)
Claude-Web (Anthropic)
GoogleOther (Google AI)
PerplexityBot (Perplexity)
Other AI bots

The increase in visits to these bots after the implementation of llms.txt is an indicator of effectiveness. Research Insightland demonstrated 600% increase in GPTBot visits after llms.txt implementation.

10 most common mistakes

Error 1: Incorrect file location

The problem: File placed in a subdirectory instead of the root directory Solution: Always place the file exactly under https://yourwebsite.com/llms.txt , not in /seo/llms.txt neither /ai/llms.txt .

Error 2: The required H1 header is missing

The problem: Starting a file without a level 1 header 
Solution: The first line must include # Project Name.

Error 3: Incorrect encoding

The problem: A file saved in an encoding other than UTF-8.
 solution: Save the file with explicit UTF-8 encoding. In most editors: File → Save with Encoding → UTF-8.

Error 4: Size limit exceeded

The problem: File larger than 100 KB 
Solution: Limit content to the most important resources. Use llms-full.txt for complete documentation.

Error 5: Incorrect Markdown link syntax

The problem: Title (https://url) instead of Title(https://url) (space before parentheses) 
Solution: Make sure there are no spaces between ] a ( .

Error 6: No blockquote with description

The problem: Omitting the contextual description of the project.
 solution: Add > Short description after the H1 header for better understanding by LLM.

Error 7: Dead links and 404 errors

The problem: Links leading to non-existent resources.
 Solution: Regularly test all URLs with tools like broken link checker.

Mistake 8: An excess of irrelevant content

The problem: List of all subpages without prioritization.
 solution: Select 5-15 most important resources. Quality > quantity.

Error 9: Not using the "Optional" section

The problem: All resources on an equal level of importance 
Solution: Place secondary resources in the section ## Optional , so that they can be overlooked with limited context.

Error 10: No post-deployment verification and no updates

The problem: Assumption that the file works without testing, no content updates.
Solution: Conduct validation and accessibility testing after each change. Add update information to the map -> Last update

Does it work?

According to the data BuiltWith october 2025, 844,473 sites has implemented the llms.txt standard. SE Ranking's analysis of 300,000 domains showed an adoption rate of 10.13%, with the majority of deployments coming from the developer tools sector, technical documentation platforms and technology companies, where AI coding assistants are critical to the business.

The llms.txt standard has been adopted by leading technology companies Anthropic (Claude documentation), Cloudflare, Stripe, Perplexity, Cursor, Solana, ElevenLabs, Hugging Face, Raycast, Yoast, DataForSEO, Zapier, Mintlify.

Case Study 1: Insightland

Results:

Increase in GPTBot visits by 600% (from a few hundred to almost 2,000 visits)
Perplexity-User appeared 7 times in 3 days
TikTok, Moz, Amazon, Petalbot, Bytedance and Bing bots visited llms.txt file
Increase in overall site traffic
No negative impact on traditional SEO rankings

Case Study 2: WordLift

Results:

Increase in organic traffic by ~25% after llms.txt implementation
Better indexing by AI, richer visibility in knowledge panels and snippets

Case Study 3: Mintlify

Results:

Reduction in time for LLM to process documentation by 40%
Improving the accuracy of AI responses by 30%
Thousands of technical documentation sites automatically received llms.txt files

Case Study 4: Cloudsential

Results:

Significant increase in visibility in AI
Cloudsential emerges as top source for ChatGPT SEO-related queries

Evidence of GEO's effectiveness

Generative Engine Optimization (GEO) study conducted by the research team showed that using content optimization strategies for generative engines can increase source visibility by up to 40% in responses generated by AI.

The most effective GEO methods:

Adding quotes - increase in visibility by more than 40%
Adding statistics - increase in visibility by more than 30%
Optimizing content fluidity - significant increase
Citing sources - significant improvement
Technical terminology - moderate improvement

The study conducted a systematic evaluation on a benchmark GEO-BENCH consisting of 10,000 diverse queries from multiple domains.

Integration with GEO/AEO ecosystem

The llms.txt standard is a fundamental component of a broader Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO) strategy. Here are the highlights of the full, holistic approach:

1. SEO is no longer a "must have", bet on GEO/AEO

Traditional SEO remains crucial, but by itself does not guarantee visibility in LLM models. A growing body of analysis shows that even brands with well-developed SEO do not always appear in the models' responses. At the same time, there are examples of brands with virtually no SEO efforts, yet they are cited by LLM, indicating that visibility in AI depends on factors other than standard search engine authority.

Available research on brand visibility in the responses of large language models shows that even global brands can remain invisible despite ranking well in traditional SEO. Our own visibility tests in Google and in LLM indicate that some brands virtually do not appear in the results for key category phrases in Google, while LLM models continue to cite them. This suggests that these brands do not have consistent SEO efforts, which limits their visibility in search engines, but does not affect their presence in LLM responses to the same extent.
Tomasz Cincio - CEO of Semly.ai

2. Structural data (Schema.org)

Schema markup implementation for FAQ, Article, Product and other types of content increases the likelihood of citation. Pages with complete structured data are significantly more likely to be cited by AI

3. AI-friendly content architecture

Front-loading: Key information at the beginning of the content
Hierarchical structure: Clear H1-H6 headings
Letters and bullet points: Increase extractability through AI - that is, the ability of a language model to extract, recall or reproduce data
Short paragraphs: <25 words per sentence, <100 words per paragraph

See how AI model bots see your site, instead of https://semly.ai enter your address: https://r.jina.ai/https://semly.ai

4. Authority and content

External citations: Mentions on authoritative third parties
Domain authority: Overall industry visibility
Freshness of content: Pages updated in the last 12 months are 2x more likely to get citations

5. Brand Visibility Score metrics

Design: (Responses mentioning your brand ÷ Total number of responses) × 100

Supporting metrics:

Citation Rate: % of LLM responses mentioning or linking to your brand
Sentiment Score: (Positive + 0.5 × Neutral mentions) ÷ All mentions
Share of Voice: % of total citations compared to competitors

AI visibility monitoring tools

The market for AI visibility monitoring tools is growing rapidly, and companies are looking for ways to understand how ChatGPT, Gemini, Perplexity or other models present their brand or products. The following overview compares Semly, Profound and Searchable. Unlike its competitors, Semly not only measures visibility in AI, but is the only tool in this comparison actively creates correct product data under LLM and data aggregators, which realistically increases the chance of brands appearing in AI recommendations.

Criterion	Semly (semly.ai)	Profound (tryprofound.com)	Searchable (searchable.com)
Overarching purpose of the tool	GEO for e-commerce, services and brands - increasing visibility in LLM responses and opening a new sales channel in AI search.	Enterprise AI visibility: monitoring how brands appear in response generative engines and answer engines, reports for large teams.	Advanced toolkit for AI search: visibility analytics, content, audits technical and AEO, combined with data from GA4 and GSC.
Role vis-à-vis LLM and data	Actively creates and standardizes data under LLM: builds structured product feeds for stores prepared under the indexation by data aggregators used by LLM (ChatGPT, Gemini and others). Semly doesn't just measure visibility, but provides the very data the models are supposed to read.	Mainly monitoring and visibility analytics: Profound Analytics, how existing brand content is cited by AI, where they get their data from and how share of voice is changing. Does not create new product feeds under LLM, only works on existing data.	Mainly tracking and visibility optimization: Searchable links data about visibility in AI with traffic analytics, audits content and on-page. It does not act as a feed manager for LLM, rather as an analysis and optimization tool.
Focus on e-commerce	Yes, e-commerce first: product designed for stores, services, brands and manufacturers who want to sell through AI.	Rather, a horizontal enterprise tool for brands across multiple industries (SaaS, retail, finance, etc.).	Horizontal AEO toolkit: supports e-commerce, but is not exclusive for stores, targets a broad marketing-SEO market.
AI visibility feature type	Visibility and sales: checks whether the store's products and offerings can be recommended by LLM and how to improve the data to increase the chance of to appear in purchase responses.	Answer engine insights: tracking brand citations, sources where AI finds information, and shares in AI search results for selected prompts.	Dashboard AI search: visibility in ChatGPT, Claude, Perplexity, etc..., combined with traffic analysis from GA4 and GSC, AEO and on-page SEO audits.
Data input	Product feed (e.g. Google Shopping XML) and data scraping for brands. Semly maps and processes data into a form, which data aggregators and LLMs can use effectively.	Prompt sets, keywords, domain, markets and competitors. The input is mainly queries to AI and service addresses.	Domains, keywords, campaigns, integrations with GA4, GSC and CMS (e.g., Webflow, Shopify, WordPress) to combine visibility with traffic.
Supported AI engines (high level)	ChatGPT, Gemini and other popular LLMs and AI surfaces used to search for services and products (AI shopping, recommendations).	ChatGPT, Perplexity, Google AI Overviews / AI Mode, Grok, Meta AI and other answer engines, especially on a large enterprise scale.	ChatGPT, Claude, Perplexity, Google AI, Copilot and classic search engines, bundled into a single visibility view.
Entry price	From about €24 per month for the Mini plan for small brands and stores (a simple subscription service for brands).	Enterprise custom pricing: no specific rates on the site, pricing after commercial contact. External reviews indicate typical plans of about $399 per month up, with a limited starting plan of about $99 per month.	Paid plans with no overt rates on the site: start with 7-day free Pro trial period, further prices visible only after going to "See all plans" or contact with the sales department. Positioned as a solution premium type for marketing teams.
Cost level vs Semly	Entry-level for brands: cost comparable to one simple saaS subscription or a trip to the movies a month.	Significantly higher: typically a multiple of Semly's cost per scale month, designed for enterprise budgets (marketing, PR, SEO).	Between Semly and Profound, closer to the tools segment premium marketing and analytical services aimed at teams and agencies, rather than individual brands.
Best use case	An online store or brand wants its products or services to be realistically available and recommended by ChatGPT, Gemini and other LLMs, and that the data is properly shared by data aggregators.	A global enterprise brand wants to measure how AI represents its brand, where does AI get its data from, what does share of voice and reputation look like in AI.	Marketing team or agency wants to combine visibility in AI search with traffic analytics, content audits and content creation process in one tool.

See if AI can see your brand

Perform a free audit

Both Profound, as well as Searchable are advanced analytical tools, but they focus on monitoring brand visibility and reputation. Semly works differently: it combines visibility monitoring with the function of creating data under LLM, so it influences what models can see and use. At the same time, the cost of entry for Semly is many times lower than for foreign enterprise platforms. As a result, Semly acts as the first real GEO tool designed for e-commerce and brands that not only reports, but actively increases the chance of sales in new AI channels.

The future of the standard

The llms.txt standard, while experimental, is evolving toward wider adoption. Google has included llms.txt in its Agents to Agents (A2A) protocol, signaling at least experimental interest. Mintlify in November 2024 enabled the automatic generation of llms.txt for every documentation site they host, instantly adding thousands of technical documentations to the ecosystem.

Jeremy Howard in March 2025, said the vision extends beyond the current reality - an AI-first web standard where language models no longer waste tokens on redundant HTML, but can focus on relevant knowledge.

Summary

The llms.txt file represents a fundamental change in the way web content is made available to AI systems. The standard, although experimental, has achieved mass adoption (over 844,000 sites) and is producing measurable results - a 20% to 40% increase in visibility in AI responses, a 600% increase in AI bot visits and a 30% improvement in response accuracy.

Key findings:

Implementation is simple, but requires precision: Markdown structure, UTF-8 encoding, location in the root directory and correct section hierarchy are key to effectiveness.

Validation is mandatory: Use ChatGPT, for example, to validate the map before publishing.

Avoid the 10 most common mistakes: Incorrect placement, missing H1, bad coding, exceeding size limit, incorrect link syntax, missing blockquote, dead links, excess content, ignoring Optional section and lack of testing.

Integrate with GEO/AEO: llms.txt is part of a broader strategy that includes SEO, structured data, AI-friendly content architecture and brand authority building.

Monitor effectiveness: Use tools like Semly.ai to track Brand Visibility Score, Citation Rate and Share of Voice.

Update regularly: Freshness of content is critical - pages updated in the last 12 months are 2x more likely to be cited.

In an era where AI is evolving into the dominant information discovery interface, controlling how language models interpret and present your brand becomes a strategic imperative. The llms.txt standard, supported by empirical evidence and growing adoption, represents a fundamental step toward an AI-first web.

FAQ - Frequently Asked Questions

Is llms.txt an official standard?
No, llms.txt is a proposed standard created by Jeremy Howard. No major LLM provider has officially confirmed that it reads these files, but empirical evidence (increase in AI bot visits, case studies) suggests that the standard is being used in practice.

Does the implementation of llms.txt guarantee citation by AI?
No, llms.txt does not guarantee citations. However, it does increase the likelihood and relevance of citations by making it easier for AI to access key content. Studies show a 20-40% increase in visibility after implementation.

Does llms.txt replace robots.txt or sitemap.xml?
No. Each of these files has a different purpose:

robots.txt - controlling access of indexing bots
sitemap.xml - list of all indexable pages for search engines
llms.txt - a curated map of key resources for AI

How often should I update llms.txt?
At least quarterly or after any significant change in site structure, addition of key content or rebranding. Content that has not been updated for more than 12 months is 2x less likely to be cited by AI.

Can I have multiple llms.txt files for different sections of the site?
Yes, the specification allows files in subpaths, such as. https://docs.example.com/llms.txt for the documentation section. Always keep the main file in the domain root directory.

What is the optimal size of llms.txt file?
The recommended limit is ~100 KB. Larger files may overload LLM context windows. For extensive documentation, use llms-full.txt as a supplement.

Does llms.txt affect traditional SEO?
Studies have shown no negative impact on SEO rankings. The file is neutral to traditional search engines and can indirectly support SEO by improving brand visibility in AI, which generates traffic to the site.

How to measure the effectiveness of llms.txt?
Monitor:

Logs and bot traffic in GA4 (increase in AI bot visits)
Tools like Semly.ai will show you your brand's visibility in AI
Brand Visibility Score and Share of Voice
Traffic from AI search engines in Google Analytics

Should small businesses implement llms.txt?
Yes, if you care about visibility in the AI ecosystem. Implementation is simple (1-4 hours), inexpensive, and can yield significant benefits with minimal risk.

What if I don't have the resources to create .md versions for all sites?
Focus on the most important 5-10 resources. Quality and prioritization are more important than completeness. You can link directly to HTML, although Markdown is preferred.

Glossary

LLM (Large Language Model) - a large AI language model capable of understanding and generating text from massive training data sets

Markdown - a lightweight markup language for formatting text, characterized by simplicity and readability

Context Window - limit of tokens (units of text) that LLM can process in a single query

GEO (Generative Engine Optimization) - the process of optimizing content to increase the chances of appearing in AI-generated responses

AEO (Answer Engine Optimization) - gEO synonyms; optimization for AI response engines

Parsing - the process of analyzing the structure of data by a computer program

User-Agent - bot or browser ID in HTTP headers

Schema.org - common structured data dictionary for websites

Brand Visibility Score - a metric that measures the frequency of brand mentions in AI responses

Sources

llmstxt.org - official specification of the standard

answer.A I (Jeremy Howard) - proposal and justification of the standard

llmstxt.site - index of websites that have already implemented llms.txt or full-llms.txt map

r.jina.ai/https://semly.ai - check how AI bots see your site

Research: GEO - academic research (40% increase in visibility)

Aggarwal P. et al, "GEO: Generative Engine Optimization," KDD '24, 2024 - novel research and framework for optimizing content visibility under AI generative systems.

See if AI can see your brand

Perform a free audit

Read other articles about AI

eCommerce

03 listopada 2025

How does Query Fanout work in AI? A complete guide

In 2025, the way people search for information on the Internet has changed fundamentally. Google AI Mode, ChatGPT and other AI systems no longer display simple lists of links - instead, they break down your question into dozens of related sub-questions, look for answers in multiple sources simultaneously and synthesize them into one complete answer. If you run an online store, create content or work on GEO - understanding the query fanout mechanism is not an option, but a necessity for your brand to be visible in the era of AI-powered search.

Semly

Does the llms.txt map work?

Check your visibility in AI

Genesis

Technical specifications

Example to implement

Optional llms-full.txt

Validation and testing

10 most common mistakes

Does it work?

Integration with GEO/AEO ecosystem

AI visibility monitoring tools

See if AI can see your brand

The future of the standard

Summary

FAQ - Frequently Asked Questions

Glossary

Sources

See if AI can see your brand

Read other articles about AI

How does Query Fanout work in AI? A complete guide

Is ChatGPT the new Google?

Positioning in ChatGPT and other AI models

How to increase the visibility of an online store in AI?

Don't let AI recommend your competition