Testing AI SEO and What Comes Next
Testing AI SEO and What Comes Next
In the first post, I explained why AI discoverability matters. In the second post, I walked through the implementation. Now the question is: how do you know it's working?
This final post covers verification, testing strategies, real-world observations, and where AI SEO is heading.
Verifying the Setup
1. Check Your Files Are Accessible
The most basic verification: can the files be reached?
# llms.txt
curl -s https://destbreso.com/llms.txt | head -5
# llms-full.txt
curl -s https://destbreso.com/llms-full.txt | wc -l
# AI context API
curl -s https://destbreso.com/api/ai-context | jq '.site.name'
# robots.txt (check AI crawler rules)
curl -s https://destbreso.com/robots.txt | grep -A2 "GPTBot"
# security.txt
curl -s https://destbreso.com/.well-known/security.txt
Each of these should return the expected content. A 404 on any of them means the file isn't being served correctly.
2. Validate robots.txt
Google provides a robots.txt tester in Search Console. For AI-specific bots, you can test manually:
# Does OAI-SearchBot have access to the blog?
# Check robots.txt for: User-agent: OAI-SearchBot → Allow: /
# Is GPTBot blocked from training crawls?
# Check robots.txt for: User-agent: GPTBot → Disallow: /
3. Validate Structured Data
Use Google's Rich Results Test to verify your JSON-LD:
- Test your homepage → should find
WebSiteschema - Test a blog post → should find
BlogPostingschema - Test the FAQ page → should find
FAQPageschema - Test the about page → should find
Personschema
4. Check Meta Tags
Open your site's source (View Source, not DevTools) and search for:
<link rel="alternate" type="text/markdown" href="/llms.txt"
<meta name="ai-content-declaration"
<meta name="citation_author"
All three should be present in the <head>.
5. Test with Actual AI Tools
This is the real test. Ask AI tools about your content and see if they can find and cite your site:
ChatGPT: "What does destbreso.com cover?"
Perplexity: "David Estévez algorithm tutorials"
Claude: "Find information about DSA patterns on destbreso.com"
Important caveat: Results depend on indexing timing. It can take days or weeks for AI search systems to discover and index your new files. Don't expect instant results.
What I Observed
The llms.txt Effect
After deploying llms.txt, I noticed that AI agents that previously returned vague or incomplete information about my site started providing more structured answers. When asked "what is destbreso.com about?", the responses began matching the curated description in llms.txt almost verbatim.
This makes sense: if an AI agent finds a clean, concise markdown file that describes the entire site, it will prefer that over trying to parse scattered HTML pages.
robots.txt Granularity Matters
After blocking GPTBot while allowing OAI-SearchBot, my site continued appearing in ChatGPT search results but my content wasn't being scraped for training. This distinction is real and enforceable. OpenAI confirms they are independent crawlers.
JSON-LD is Already Powerful
Most of the "AI SEO" benefit actually comes from existing structured data. The JSON-LD schemas I already had (BlogPosting, Person, FAQPage, etc.) are consumed by AI search agents just as effectively as by Google. If you have good structured data, you're already halfway there.
Common Mistakes
1. Blocking Everything
Some developers react to AI crawlers by blocking all of them in robots.txt. This is understandable but counterproductive:
# DON'T do this unless you really mean it
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
Blocking OAI-SearchBot means your site won't appear in ChatGPT search results. That's traffic you're leaving on the table. Block training (GPTBot) if you want, but think carefully before blocking search.
2. Over-Engineering
You don't need a complex system. The minimum viable AI SEO is:
- A
llms.txtfile (30 minutes to write) - A few lines in
robots.txt(5 minutes) - Basic structured data (you probably already have this)
Don't build an elaborate API endpoint until you have the basics working.
3. Neglecting Maintenance
A llms.txt file that describes a version of your site from six months ago is worse than no file at all. AI agents will trust the curated description and provide outdated information. Keep it in sync, and add it to your deployment checklist.
4. Ignoring Existing SEO
AI SEO is additive, not replacement. Your sitemap, meta tags, OpenGraph, canonical URLs, and page titles are still critical. AI search agents use all of these signals alongside llms.txt.
Tools and Resources
For Testing
| Tool | Purpose |
|---|---|
| Google Rich Results Test | Validate JSON-LD structured data |
| Schema.org Validator | Validate any schema.org markup |
| llmstxt.site | Directory of sites with llms.txt (submit yours) |
| Ahrefs Webmaster Tools | Free site audit including technical SEO |
curl + jq | Manual verification of endpoints |
For Monitoring
- Google Search Console: Track how your site appears in Google, including AI Overviews
- Server logs: Look for
OAI-SearchBot,GPTBot,PerplexityBotin your access logs - Vercel Analytics: If you're on Vercel, you can see bot traffic patterns
Where This Is Going
Near-Term (2026)
llms.txt will become standard. More sites are adopting it weekly. The llmstxt.site directory is growing rapidly. Once a critical mass of sites have it, AI agents will start looking for it by default.
AI search will overtake traditional search for informational queries. When someone wants to know "what's the best approach for pathfinding in React?", they'll increasingly ask an AI rather than searching Google. Your content needs to be discoverable by both.
Structured data becomes more important. As AI agents get better at understanding schema.org markup, having rich structured data gives you an advantage. FAQPage schema, HowTo schema, and BlogPosting schema are especially valuable.
Medium-Term (2027-2028)
AI-specific sitemaps. Just as sitemap.xml exists for search engines, we might see a standardized sitemap format optimized for AI consumption, perhaps a JSON-LD graph of the entire site's content.
Bidirectional attribution. Currently, AI agents cite sources inconsistently. Standards will emerge for how AI systems should attribute content, and citation_author / citation_url meta tags (or their successors) will become as standard as OpenGraph tags.
Semantic search APIs. Instead of AI agents reading your static files, they might query a standardized API: "What content do you have about dynamic programming?" Your site responds with a curated, contextual answer. This is essentially what /api/ai-context foreshadows.
Long-Term
MCP (Model Context Protocol) for websites. Anthropic's MCP standard, currently used for tool connections, could extend to web content. Imagine your website exposing an MCP server that AI agents can query directly, with structured tools for "get blog posts about X" or "find the author's expertise in Y."
AI-native content formats. Beyond HTML and Markdown, new content formats might emerge that are specifically designed for AI consumption: structured, semantic, and context-window-aware.
A Practical Checklist
Here's what I recommend for any developer who wants to start with AI SEO today:
□ Create /llms.txt with your site's curated overview
□ Add AI crawler rules to robots.txt (allow search, block training)
□ Ensure JSON-LD structured data on key pages
□ Add <link rel="alternate"> for llms.txt in your <head>
□ Add citation meta tags
□ Submit your site to llmstxt.site directory
□ Test with ChatGPT, Perplexity, and Claude
□ Set a calendar reminder to update llms.txt quarterly
Total time: 2-4 hours for a basic implementation. The ROI compounds over time as AI search traffic grows.
Final Thoughts
AI SEO isn't a fad or a niche optimization. It's the next evolution of how content gets discovered on the internet. The tools are simple, the standards are emerging, and the first-mover advantage is real.
The best part? Everything I've described is additive. Nothing breaks your existing SEO. Nothing requires a redesign. It's a few files and some meta tags, and it positions your site for how people will find information for the next decade.
Start with llms.txt. It takes 30 minutes. You'll be ahead of 99% of websites.
Series Navigation
- Your Website Has a New Audience (And It's Not Human) -> Why AI SEO matters
- Implementing AI SEO on a Next.js Site -> The practical guide
- Testing AI SEO and What Comes Next -> You are here