Why do LLMs need different data than humans?
Language models don't "scan" pages like traditional search engines - they understand meaning. AI analyzes data structure, fact verifiability and semantic relationships before it decides to recommend a product in its recommendations. Store owners who do not adapt their product cards to this new paradigm will become invisible to the rapidly growing group of customers buying through AI assistants.
Structured data - a fundamental layer of understanding
Schema.org Product - minimum standard.
Each product sheet must contain tags Schema.org in JSON-LD format. This is no longer an option, but a requirement for visibility in AI.
An example of a complete structure:
"@context": "https://schema.org/",
"@type": "Product",
"name": "These are waterproof GoreTex Pro trekking boots",
"description": "Trekking boots designed for individuals tackling demanding mountain trails in variable weather conditions. The GoreTex construction keeps feet dry during stream crossings and sudden rainfall, and the aggressive tread provides traction on loose rocks and muddy paths. Ideal for multi day mountain expeditions, single day hikes in alpine terrain, and for anyone who refuses to let the weather dictate the terms of their adventure. Suitable for use in temperatures down to minus 20°C.",
"sku": "TREK-2025-GT",
"gtin": "5901234567890",
"mpn": "GT-PRO-45",
"brand": {
"@type": "Brand",
"name": "MountainTech"
},
"image": [
"https://example.com/buty-trek-1x1.jpg",
"https://example.com/buty-trek-4x3.jpg",
"https://example.com/buty-trek-16x9.jpg"
],
"offers": {
"@type": "Offer",
"url": "https://example.com/product/trekking-boots-goretex",
"priceCurrency": "EUR",
"price": "99.99",
"priceValidUntil": "2025-12-31",
"itemCondition": "https://schema.org/NewCondition",
"availability": "https://schema.org/InStock",
"shippingDetails": {
"@type": "OfferShippingDetails",
"shippingRate": {
"@type": "MonetaryAmount",
"value": "0",
"currency": "EUR"
},
"deliveryTime": {
"@type": "ShippingDeliveryTime",
"handlingTime": "1-2 days",
"transitTime": "2-3 days"
}
}
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.8",
"reviewCount": "347"
},
"additionalProperty": [
{
"@type": "PropertyValue",
"name": "Outer material",
"value": "GoreTex Pro"
},
{
"@type": "PropertyValue",
"name": "Type of terrain",
"value": "Mountains, high altitude trails"
},
{
"@type": "PropertyValue",
"name": "Upper height",
"value": "Mid (above the ankle)"
},
{
"@type": "PropertyValue",
"name": "Insulation",
"value": "Thinsulate 200g"
}
]
}Key fields required by ChatGPT Shopping
OpenAI defines a precise product feed specification with more than 100 attributes. Most important for Polish stores:
Mandatory fields:
id- a unique product identifier (stable over time, max 100 characters);title- 150 characters maximum, without CAPSEM writing;description- up to 5,000 characters of plain text (without HTML);link- Product Card URL (HTTPS preferred);price- current price;availability- stock availability;enable_search- flag that controls visibility in ChatGPT results;enable_checkout- enabling purchase directly from ChatGPT
Fields recommended for advantage:
gtinormpn- manufacturer identifiers (GTINs are 8-14 digits without dashes);image_link- a minimum of 3 image variations (1x1, 4x3, 16x9);product_type- a hierarchical category (e.g., "Clothing > Women's > Sports > Trekking Pants");popularity_score- assessing the popularity of the product;return_rate- return rate (low increases recommendations);
ChatGPT accepts feed updates every 15 minutes, which means outdated price or stock data has no excuse.
Product descriptions - from keywords to semantic context
Transformation of description, before and after.
Traditional description (ineffective for AI):
The best thermal bottle on the market. Made of high quality stainless steel. Available in different colors. Perfect gift!
Description optimized for LLM:
A 750ml 18/8 stainless steel thermal bottle designed for travelers in need of durable, insulated hydration. Double vacuum insulation keeps drinks cold for 24 hours or hot for 12 hours. Compact design fits in bike handle and backpack side pocket. Ideal for physically active people and outdoor enthusiasts. BPA-free certified, dishwasher safe (top shelf). Manufacturer's lifetime warranty.
Key differences:
- Concrete measurements instead of generalities.
- Defined target group ("travelers", "active people").
- Verifiable facts (24-hour isolation, certifications).
- Use cases (bike, backpack, work).
- No marketing superlatives without data.
Formulaic context: who?, why?, when?
The best descriptions answer three AI questions
- Who is this product for? - "For parents of children who don't have much time on a daily basis."
- What problem does it solve? - "keeps the foot dry during creek crossings and sudden rainfall."
- Under what conditions does it work? - "multi-day mountain expeditions, in temperatures as low as -20°C."
Adding one sentence starting with "Ideal for..." or "Developed with..." can dramatically increase the accuracy of AI recommendations.
Additional properties - attributes that determine the advantage
Why are optional fields not optional?
AI prefers products with maximum data completeness. While most vendors only fill in required fields, the best ranking products include all possible additional attributes.
Example: Gaming laptop
"additionalProperty":
[
{
"@type": "PropertyValue",
"name": "Operating system",
"value": "Windows 11 Pro"
},
{
"@type": "PropertyValue",
"name": "Processor",
"value": "Intel Core i9-13900K"
},
{
"@type": "PropertyValue",
"name": "RAM",
"value": "32GB DDR5"
},
{
"@type": "PropertyValue",
"name": "Storage capacity",
"value": "2TB NVMe SSD"
},
{
"@type": "PropertyValue",
"name": "Graphics card",
"value": "NVIDIA RTX 4080 12GB"
},
{
"@type": "PropertyValue",
"name": "Battery life",
"value": "8 hours (office work)"
},
{
"@type": "PropertyValue",
"name": "Refresh rate",
"value": "240Hz"
},
{
"@type": "PropertyValue",
"name": "Weight",
"value": "2.4 kg"
}
]When a customer asks AI "4K video processing laptop with fast rendering," the system searches these properties to match the query to the product.
Granular categories instead of general categories.
Badly: "Clothing > Pants"
Well: "Clothing > Women's > Sportswear > Trekking Pants > With Waterproof Membrane"
Granular categorization reduces ambiguity and allows AI to group a product with real counterparts, not loosely related items. This also improves recommendations in "similar products" sections on third-party platforms.
Reviews and ratings - trust signals for AI
AggregateRating + Review structure.
LLMs rely heavily on reviews to create purchase recommendations. It's not enough to display stars - you need to add structured tags
{
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "892",
"bestRating": "5",
"worstRating": "1"
},
"review": [
{
"@type": "Review",
"reviewRating": {
"@type": "Rating",
"ratingValue": "5",
"bestRating": "5"
},
"author": {
"@type": "Person",
"name": "Anna Kowalska"
},
"reviewBody": "Ideal for trekking in the Tatras, they kept the water cold all day, even in hot weather. The construction is solid, with no leaks.",
"datePublished": "2025-10-15"
}
]
}Best practices for reviews:
- Encourage detailed feedback from customers mentioning use cases.
- Use "verified purchase" tags.
- Avoid duplicating review content between platforms (AI detects redundancy).
- Prefer reviews with emotional language and context: "Perfect for hiking - water cold for 8 hours."
Semantic relationships between products
Building a product knowledge graph.
AI does not see your store as isolated pages - it sees it as a network of related entities. Use Schema.org properties to link products:
{
"@type": "Product",
"name": "Replacement filter for the EcoSmart bottle",
"isAccessoryOrSparePartFor": {
"@type": "Product",
"name": "EcoSmart thermal bottle 750ml",
"url": "https://example.com/butelka-ecosmart"
}
}Other useful relationships:
isRelatedTo- related products;isSimilarTo- alternatives;isConsumableFor- consumables;
Contextual internal links reinforce these relationships:
- "It fits..."
- "Compatible with..."
- "Customers also bought..."
This helps AI build relational understanding between items in the catalog, increasing the inclusion of "recommended alternatives" in summaries
FAQ Schema - preparing for conversational queries
Structuring the most common questions.
LLMs often generate recommendations based on intention expressed in natural language. Add FAQPage schema for key questions:
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Is the bottle dishwasher safe?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes, the EcoSmart bottle is fully safe to wash in the dishwasher on the top rack. We recommend removing the seal before washing for better hygiene."
}
},
{
"@type": "Question",
"name": "How long does it maintain temperature?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The double vacuum insulation keeps drinks cold for 24 hours or hot for 12 hours, confirmed by tests at 21°C room temperature."
}
},
{
"@type": "Question",
"name": "What is the warranty?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The product is covered by a lifetime manufacturer warranty for material and production defects. Normal wear and mechanical damage are not covered."
}
}
]
}Questions to include:
- Does it have certifications, is it organic, etc.?
- How long has it been working?
- What are the terms of the warranty?
- Who is it intended for?
These answers make the content ready for summaries by LLM, improving visibility in conversational and voice search.
Transaction and logistics data
Delivery time and return terms.
AI queries often include purchasing context: "fast shipping", "free returns", "available in stock".
{
"offers": {
"@type": "Offer",
"shippingDetails": {
"@type": "OfferShippingDetails",
"shippingRate": {
"@type": "MonetaryAmount",
"value": "0",
"currency": "EUR"
},
"deliveryTime": {
"@type": "ShippingDeliveryTime",
"handlingTime": {
"@type": "QuantitativeValue",
"minValue": 1,
"maxValue": 2,
"unitCode": "DAY"
},
"transitTime": {
"@type": "QuantitativeValue",
"minValue": 2,
"maxValue": 3,
"unitCode": "DAY"
}
},
"shippingDestination": {
"@type": "DefinedRegion",
"addressCountry": "GB"
}
},
"hasMerchantReturnPolicy": {
"@type": "MerchantReturnPolicy",
"applicableCountry": "GB",
"returnPolicyCategory": "https://schema.org/MerchantReturnFiniteReturnWindow",
"merchantReturnDays": 30,
"returnMethod": "https://schema.org/ReturnByMail",
"returnFees": "https://schema.org/FreeReturn"
}
}
}Key fields:
availability- Availability (InStock, OutOfStock, PreOrder);
priceValidUntil- price validity;
shippingDetails- lead time and delivery;
hasMerchantReturnPolicy- return details;
Outdated inventory and availability data reduces AI confidence and potential for recommendations
Verification and consistency of external data
Entity Consistency - the key to AI trust.
Trust in artificial intelligence is based on the consistency of data about who or what you are. If AI always recognizes your brand, product or company as the same then add "sameAs" links to official profiles:
{
"@type": "Brand",
"name": "EcoSmart",
"sameAs": [
"https://www.facebook.com/ecosmart.polska",
"https://www.instagram.com/ecosmart_pl",
"https://pl.linkedin.com/company/ecosmart",
"https://www.wikidata.org/wiki/Q123456"
]
}External confidence signals:
- Manufacturer's website.
- Social profiles.
- Press mentions.
- Partners and video content such as Youtube reviews.
Make sure brand names, SKU codes and product descriptions are consistent across all platforms. This helps AI understand your products as verified entities in the broader e-commerce ecosystem.
AI understands the context of online conversations
Context automation at scale: Reddit, Quora, Facebook.
In the era of generative AI, brands no longer need to manually tailor their content for each channel or community. Automation of context means that artificial intelligence can recognize the topic of conversation, the tone of discussion and the users' intentions - and then automatically adjust the brand's message to fit naturally into the conversation.
It's not just a matter of automatically publishing content. The key lies in understanding the context - AI analyzes not only words, but also emotions and intentions, so that the brand's message sounds authentic and reaches the right audience.
On platforms such as Reddit, Quora or Facebook, where millions of threads are running daily, AI analyzes context in real time and helps brands appear where their presence makes sense. This ensures that content is not random - it becomes relevant, consistent and credible.
It is not only the automation of publications, but automation of understanding - a new phase of communication on the Internet, where artificial intelligence combines scale with authenticity.
Practical checklist
Structural data:
- JSON-LD Schema.org Product on each page.
- Completed fields: name, description, sku, brand, image, offers.
- GTIN or MPN for product identification.
- A minimum of 3 variants of images (different aspect ratios).
- AggregateRating and Review schema for reviews.
Product Descriptions:
- Description of 200-500 words with specific use cases.
- Defined target group ("for whom").
- Verifiable specifications (dimensions, materials, certifications).
- Usage scenarios ("when", "where").
- Avoid generalities without data ("best", "premium").
Additional attributes:
- All optional additionalProperty fields filled in.
- Granular categorization (min. 4 levels).
- Technical specifications at PropertyValue.
Relationships and FAQs:
- Related Products by isRelatedTo, isAccessoryOrSparePartFor.
- FAQPage schema with 5-10 of the most common questions.
- Internal links to complementary products.
Transaction data:
- Current availability and price (updated at least once a day).
- ShippingDetails with lead time and delivery.
- MerchantReturnPolicy for the return policy.
External consistency:
- SameAs links to the brand's official profiles.
The future - multimodal AI and voice search
LLM optimization is preparation for multimodal search - text, voice, image. Products with rich descriptions, alternative image texts, and structured FAQs are ready for
- Visual Search (Google Lens, Pinterest).
- Voice assistants (Alexa, Google Assistant).
- Shopping inside ChatGPT (Instant Checkout).
- AI-driven discovery in TikTok Shop, Instagram Shopping.
A directory optimized for LLM becomes a a dataset that AI can trust and recommend in any purchasing context
E-commerce in 2025 is not about chasing rankings - it's about teaching AI to understand your products. When ChatGPT, Perplexity or Google SGE get the query "best gift for a mountain lover," your product is either in the answer or it doesn't exist. Data structure, semantic context and verifiable facts determine whether AI will recommend your store - or a competitor's store.
Tomasz Cincio - CEO of Semly.ai
Glossary
JSON-LD - a format for recording structured data in a page's code that helps search engines and AI models understand what the content represents (e.g., product, price, reviews).
Schema.org - a common data markup standard for search engines (Google, Bing, Yahoo). Allows standardized descriptions of products, articles, events, etc.
LLM (Large Language Model) - a large language model, such as ChatGPT or Gemini, that analyzes and generates text by understanding the context of user queries.
Structural data - information written in a way that algorithms can understand, such as product title, price, reviews, availability.
Generative AI - an artificial intelligence system capable of creating new content: text, images, code or recommendations.
Share:
