Your Website Has a New Audience — And It's Not Human
Your Website Has a New Audience (And It's Not Human)
There was a moment - probably sometime in late 2024, maybe early 2025 - when the way people find information on the internet changed fundamentally. Not dramatically, not with a single event, but gradually and irreversibly. The shift wasn't about a new search engine. It was about the fact that search engines themselves started thinking.
Google introduced AI Overviews. ChatGPT got browsing capabilities. Perplexity became a serious research tool. Claude learned to search the web. And suddenly, the question wasn't just "how does Google rank my page?" but "can an AI agent understand what my site is about?"
This is the first post in a three-part series about what I did to prepare my personal website for this shift, not just as SEO hygiene, but as a genuine rethinking of who (or what) is consuming my content.
The Old Mental Model
For years, SEO meant one thing: make Google happy.
Write good titles. Add meta descriptions. Use semantic HTML. Build backlinks. Optimize images. Sprinkle in some structured data. Submit a sitemap. Pray to the Core Web Vitals gods.
This model worked, and still works, for a specific consumer: a search engine crawler that reads HTML, follows links, and ranks pages based on relevance signals. Googlebot is sophisticated, but its fundamental operation is predictable. It wants HTML. It wants links. It wants signals of authority.
But something happened. The consumer changed.
The New Consumers
Today, your website has at least three distinct types of automated visitors:
1. Traditional Search Crawlers
Googlebot, Bingbot, and their friends. These haven't gone away. They still read your HTML, process your structured data, and rank your pages. The traditional SEO playbook still applies.
2. AI Search Agents
These are the new players. When a user asks ChatGPT "who is David Estévez developer?", ChatGPT doesn't look at its training data (which might be outdated). Instead, it dispatches a search agent (OAI-SearchBot) that actually browses the web in real time, reads pages, and brings back information to compose an answer.
Perplexity does this natively. Google's AI Overviews do it. Bing Copilot does it. These agents need to:
- Find your site (discovery)
- Read it quickly (processing)
- Extract the right information (comprehension)
- Cite you correctly (attribution)
3. AI-Powered Developer Tools
Coding assistants, documentation tools, and research agents. When a developer asks their AI assistant about a specific technology or library, the assistant might browse to your site, read your documentation, and use it as context. These tools benefit enormously from structured, machine-readable content.
The Gap
Here's the problem: most websites are optimized exclusively for consumer #1.
They have sitemaps, meta tags, and structured data designed for traditional crawlers. But when an AI agent visits, it faces challenges:
Context window limits. An LLM can't process your entire website. A typical page with navigation, footer, sidebar, and JavaScript-rendered content might be 50KB+ of text. An LLM needs the essential information in a few kilobytes.
HTML is noisy. AI agents can parse HTML, but they have to wade through nav menus, cookie banners, footer links, and React hydration artifacts to find the actual content. Markdown is orders of magnitude easier to process.
No curated entry point. A sitemap lists every URL. But which ones matter? What's the recommended reading order? What's the relationship between pages? An AI agent arriving at your site has no guide.
No attribution guidance. When ChatGPT cites your site, how should it reference you? There's no standard way to tell an AI "cite me as X."
What Changed My Thinking
I was working on my personal site at (destbreso.com) which has a blog, interactive demos, npm packages, learning paths, and a portfolio. It's a fairly content-rich site. Traditional SEO was pretty solid: sitemap, meta tags, OpenGraph, JSON-LD structured data for every content type.
But I started noticing something. When I asked ChatGPT or Perplexity about topics I'd written about - algorithm patterns, interactive visualizations - my site rarely appeared. It wasn't a ranking problem. The content was good, the pages were indexed. The problem was that these AI systems simply couldn't efficiently extract and organize the information.
That's when I discovered two things:
- The llms.txt proposal: a specification by Jeremy Howard (of fast.ai fame) for a
/llms.txtfile that provides LLM-friendly content in markdown. - AI crawlers have specific user agents: OpenAI's
OAI-SearchBot,GPTBot, andChatGPT-User; Google'sGoogle-Extended; Anthropic'sClaudeBot; Perplexity'sPerplexityBot. Each can be individually controlled viarobots.txt.
These weren't theoretical proposals. Major sites were already implementing them. The llms.txt directory lists hundreds of adopters. OpenAI's documentation explicitly describes how webmasters can control access per crawler.
The Three Layers
What I realized is that AI discoverability isn't a single thing, it's a layered strategy:
Layer 1: Control (robots.txt)
Who can access your site and for what purpose? This is about policy. You might want ChatGPT Search to cite your pages (good for traffic), but not want GPTBot to scrape your content for model training (no benefit to you).
Layer 2: Discovery (llms.txt + meta tags)
Once an AI agent is allowed in, how does it find the important content? This is about providing a curated, concise entry point that fits within context windows.
Layer 3: Comprehension (structured data + API)
How does the AI agent understand your content's structure, relationships, and metadata? This is about making your content machine-parseable at a semantic level.
Each layer builds on the previous one. Without Layer 1, you have no control. Without Layer 2, AI agents are fumbling through your HTML. Without Layer 3, they can't deeply understand what you offer.
What's Next
In the next post, I'll walk through the actual implementation: every file, every decision, every trade-off. We'll build the three layers on a real Next.js site and test them against actual AI crawlers.
In the third post, we'll verify the implementation, discuss what worked and what didn't, and look at where this space is heading.
References
- llms.txt Specification Jeremy Howard, September 2024
- OpenAI Crawlers Documentation GPTBot, OAI-SearchBot, ChatGPT-User
- Google Crawlers Overview
- llmstxt.site Directory of llms.txt adopters