David Estévez

Your Website Has a New Audience. And It's Not Human

Want to LearnDev Insights

March 19, 20268 min read

AISEOLLMChatGPTWeb Developmentllms.txtDiscoverability

Your Website Has a New Audience (And It's Not Human)

There was a moment (probably sometime in late 2024, maybe early 2025) when the way people find information on the internet changed fundamentally. Not dramatically, not with a single event, but gradually and irreversibly. The shift wasn't about a new search engine. It was about the fact that search engines themselves started thinking.

Google introduced AI Overviews. ChatGPT got browsing capabilities. Perplexity became a serious research tool. Claude learned to search the web. And suddenly, the question wasn't just "how does Google rank my page?" but "can an AI agent understand what my site is about?"

This is the first post in a three-part series about what I did to prepare my personal website for this shift, not just as SEO hygiene, but as a genuine rethinking of who (or what) is consuming my content.

The Old Mental Model

For years, SEO meant one thing: make Google happy.

Write good titles. Add meta descriptions. Use semantic HTML. Build backlinks. Optimize images. Sprinkle in some structured data. Submit a sitemap. Pray to the Core Web Vitals gods.

This model worked, and still works, for a specific consumer: a search engine crawler that reads HTML, follows links, and ranks pages based on relevance signals. Googlebot is sophisticated, but its fundamental operation is predictable. It wants HTML. It wants links. It wants signals of authority.

But something happened. The consumer changed.

The New Consumers

Today, your website has at least three distinct types of automated visitors:

1. Traditional Search Crawlers

Googlebot, Bingbot, and their friends. These haven't gone away. They still read your HTML, process your structured data, and rank your pages. The traditional SEO playbook still applies.

2. AI Search Agents

These are the new players. When a user asks ChatGPT "who is David Estévez developer?" with browsing or search enabled, ChatGPT doesn't just lean on its training data (which might be outdated). Instead, it dispatches a search agent (OAI-SearchBot) that actually browses the web in real time, reads pages, and brings back information to compose an answer. (Without search, it falls back to whatever it memorized at training time, which is exactly the stale answer you want to avoid.)

Perplexity does this natively. Google's AI Overviews do it. Bing Copilot does it. These agents need to:

Find your site (discovery)
Read it quickly (processing)
Extract the right information (comprehension)
Cite you correctly (attribution)

3. AI-Powered Developer Tools

Coding assistants, documentation tools, and research agents. When a developer asks their AI assistant about a specific technology or library, the assistant might browse to your site, read your documentation, and use it as context. These tools benefit enormously from structured, machine-readable content.

The Gap

Here's the problem: most websites are optimized exclusively for consumer #1.

They have sitemaps, meta tags, and structured data designed for traditional crawlers. But when an AI agent visits, it faces challenges:

Context window limits. An LLM can't process your entire website. A typical page with navigation, footer, sidebar, and JavaScript-rendered content might be 50KB+ of text. An LLM needs the essential information in a few kilobytes.

HTML is noisy. AI agents can parse HTML, but they have to wade through nav menus, cookie banners, footer links, and React hydration artifacts to find the actual content. Markdown is orders of magnitude easier to process.

No curated entry point. A sitemap lists every URL. But which ones matter? What's the recommended reading order? What's the relationship between pages? An AI agent arriving at your site has no guide.

No attribution guidance. When ChatGPT cites your site, how should it reference you? There's no standard way to tell an AI "cite me as X."

What Changed My Thinking

I was working on my personal site at (destbreso.com) which has a blog, interactive demos, npm packages, learning paths, and a portfolio. It's a fairly content-rich site. Traditional SEO was pretty solid: sitemap, meta tags, OpenGraph, JSON-LD structured data for every content type.

But I started noticing something. When I asked ChatGPT or Perplexity about topics I'd written about - algorithm patterns, interactive visualizations - my site rarely appeared. It wasn't a ranking problem. The content was good, the pages were indexed. The problem was that these AI systems simply couldn't efficiently extract and organize the information.

That's when I discovered two things:

The llms.txt proposal: a specification by Jeremy Howard (of fast.ai fame) for a /llms.txt file that provides LLM-friendly content in markdown.
AI crawlers have specific user agents: OpenAI's OAI-SearchBot, GPTBot, and ChatGPT-User; Google's Google-Extended; Anthropic's ClaudeBot; Perplexity's PerplexityBot. Each can be individually controlled via robots.txt.

These weren't theoretical proposals. Major sites were already implementing them. The llms.txt directory lists hundreds of adopters. OpenAI's documentation explicitly describes how webmasters can control access per crawler.

The Three Layers

What I realized is that AI discoverability isn't a single thing, it's a layered strategy:

Layer 1: Control (robots.txt)

Who can access your site and for what purpose? This is about policy. You might want ChatGPT Search to cite your pages (good for traffic), but not want GPTBot to scrape your content for model training (no benefit to you).

Layer 2: Discovery (llms.txt + meta tags)

Once an AI agent is allowed in, how does it find the important content? This is about providing a curated, concise entry point that fits within context windows.

Layer 3: Comprehension (structured data + API)

How does the AI agent understand your content's structure, relationships, and metadata? This is about making your content machine-parseable at a semantic level.

Each layer builds on the previous one. Without Layer 1, you have no control. Without Layer 2, AI agents are fumbling through your HTML. Without Layer 3, they can't deeply understand what you offer.

What's Next

In the next post, I'll walk through the actual implementation: every file, every decision, every trade-off. We'll build the three layers on a real Next.js site and test them against actual AI crawlers.

In the third post, we'll verify the implementation, discuss what worked and what didn't, and look at where this space is heading.

References

llms.txt Specification Jeremy Howard, September 2024
OpenAI Crawlers Documentation GPTBot, OAI-SearchBot, ChatGPT-User
Google Crawlers Overview
llmstxt.site Directory of llms.txt adopters

Your Website Has a New Audience. And It's Not Human

Want to LearnDev Insights

March 19, 20268 min read

AISEOLLMChatGPTWeb Developmentllms.txtDiscoverability

Your Website Has a New Audience (And It's Not Human)

The Old Mental Model

For years, SEO meant one thing: make Google happy.

Write good titles. Add meta descriptions. Use semantic HTML. Build backlinks. Optimize images. Sprinkle in some structured data. Submit a sitemap. Pray to the Core Web Vitals gods.

But something happened. The consumer changed.

The New Consumers

Today, your website has at least three distinct types of automated visitors:

1. Traditional Search Crawlers

Googlebot, Bingbot, and their friends. These haven't gone away. They still read your HTML, process your structured data, and rank your pages. The traditional SEO playbook still applies.

2. AI Search Agents

Perplexity does this natively. Google's AI Overviews do it. Bing Copilot does it. These agents need to:

Find your site (discovery)
Read it quickly (processing)
Extract the right information (comprehension)
Cite you correctly (attribution)

3. AI-Powered Developer Tools

The Gap

Here's the problem: most websites are optimized exclusively for consumer #1.

They have sitemaps, meta tags, and structured data designed for traditional crawlers. But when an AI agent visits, it faces challenges:

No attribution guidance. When ChatGPT cites your site, how should it reference you? There's no standard way to tell an AI "cite me as X."

What Changed My Thinking

That's when I discovered two things:

The llms.txt proposal: a specification by Jeremy Howard (of fast.ai fame) for a /llms.txt file that provides LLM-friendly content in markdown.
AI crawlers have specific user agents: OpenAI's OAI-SearchBot, GPTBot, and ChatGPT-User; Google's Google-Extended; Anthropic's ClaudeBot; Perplexity's PerplexityBot. Each can be individually controlled via robots.txt.

The Three Layers

What I realized is that AI discoverability isn't a single thing, it's a layered strategy:

Layer 1: Control (robots.txt)

Layer 2: Discovery (llms.txt + meta tags)

Once an AI agent is allowed in, how does it find the important content? This is about providing a curated, concise entry point that fits within context windows.

Layer 3: Comprehension (structured data + API)

How does the AI agent understand your content's structure, relationships, and metadata? This is about making your content machine-parseable at a semantic level.

Each layer builds on the previous one. Without Layer 1, you have no control. Without Layer 2, AI agents are fumbling through your HTML. Without Layer 3, they can't deeply understand what you offer.

What's Next

In the third post, we'll verify the implementation, discuss what worked and what didn't, and look at where this space is heading.

References

llms.txt Specification Jeremy Howard, September 2024
OpenAI Crawlers Documentation GPTBot, OAI-SearchBot, ChatGPT-User
Google Crawlers Overview
llmstxt.site Directory of llms.txt adopters

Your Website Has a New Audience. And It's Not Human

Your Website Has a New Audience (And It's Not Human)

The Old Mental Model

The New Consumers

1. Traditional Search Crawlers

2. AI Search Agents

3. AI-Powered Developer Tools

The Gap

What Changed My Thinking

The Three Layers

Layer 1: Control (robots.txt)

Layer 2: Discovery (llms.txt + meta tags)

Layer 3: Comprehension (structured data + API)

What's Next

References

Implementing AI SEO on a Next.js Site: A Practical Guide

Related Posts

Implementing AI SEO on a Next.js Site: A Practical Guide

Testing AI SEO and What Comes Next

Your Website Has a New Audience. And It's Not Human

Your Website Has a New Audience (And It's Not Human)

The Old Mental Model

The New Consumers

1. Traditional Search Crawlers

2. AI Search Agents

3. AI-Powered Developer Tools

The Gap

What Changed My Thinking

The Three Layers

Layer 1: Control (robots.txt)

Layer 2: Discovery (llms.txt + meta tags)

Layer 3: Comprehension (structured data + API)

What's Next

References

Implementing AI SEO on a Next.js Site: A Practical Guide

Related Posts

Implementing AI SEO on a Next.js Site: A Practical Guide

Testing AI SEO and What Comes Next