Implementing AI SEO on a Next.js Site — A Practical Guide
Implementing AI SEO on a Next.js Site. A Practical Guide
In the previous post, I argued that websites now have a new audience (AI agents) and that most sites aren't prepared for them. In this post, I'll walk through everything I actually built to address this on my personal site, a Next.js application at destbreso.com.
This isn't theoretical. Every code snippet here is running in production. I'll explain the decisions behind each piece and the trade-offs involved.
The Architecture
The implementation has five components, each addressing a different part of the AI discoverability problem:
┌─────────────────────────────────────────────┐
│ destbreso.com │
├──────────┬──────────┬──────────┬────────────┤
│ robots.ts│ llms.txt │ ai-ctx │ meta tags │
│ (control)│ (curated)│ (struct) │ (hints) │
└──────────┴──────────┴──────────┴────────────┘
Let's build each one.
1. robots.txt. Granular AI Crawler Control
The first layer is policy. Next.js lets you generate robots.txt dynamically via app/robots.ts:
import type { MetadataRoute } from "next";
export default function robots(): MetadataRoute.Robots {
return {
rules: [
// Standard search engines: business as usual
{
userAgent: "*",
allow: "/",
disallow: ["/api/", "/dashboard/", "/maintenance/"],
},
// OpenAI search: allow (appear in ChatGPT search results)
{
userAgent: "OAI-SearchBot",
allow: ["/", "/llms.txt", "/llms-full.txt"],
disallow: ["/api/", "/dashboard/"],
},
// OpenAI training: disallow (no benefit to site owner)
{
userAgent: "GPTBot",
disallow: ["/"],
},
// ChatGPT user-initiated browsing
{
userAgent: "ChatGPT-User",
allow: ["/"],
},
// Google Gemini / AI features
{
userAgent: "Google-Extended",
allow: ["/"],
},
// Anthropic
{
userAgent: "anthropic-ai",
allow: ["/"],
},
// Perplexity
{
userAgent: "PerplexityBot",
allow: ["/"],
},
// Common Crawl (training datasets): disallow
{
userAgent: "CCBot",
disallow: ["/"],
},
],
sitemap: `https://destbreso.com/sitemap.xml`,
};
}
The Key Decision: Search vs. Training
This is the most important distinction in AI robots.txt policy:
-
OAI-SearchBot → Surfaces your site in ChatGPT search results. Allow. This is essentially the AI equivalent of Googlebot. If you block it, your site won't appear when users search via ChatGPT.
-
GPTBot → Crawls content for OpenAI's model training. Disallow. There's no direct benefit to you. Your content gets absorbed into the model's weights with no attribution.
-
ChatGPT-User → Triggered by a user action (e.g., "browse this URL"). Since it's user-initiated,
robots.txtis advisory, but it's still good practice to allow it explicitly.
OpenAI's documentation states these are independent, you can allow search while blocking training, or vice versa.
2. llms.txt. The AI Entry Point
This is the heart of the implementation. The llms.txt specification proposes a markdown file at /llms.txt that provides LLM-friendly content.
Why Markdown?
LLMs understand markdown natively. It's their preferred input format. HTML requires parsing, stripping navigation, handling JavaScript. Markdown is clean, structured text that fits directly into a context window.
The Concise Version: /llms.txt
# David Estévez. destbreso.com
> Personal website of David Estévez, a mathematician and software developer.
> Specializing in mathematical modeling, algorithm design, and clean software.
## Site Structure
- [Blog](https://destbreso.com/blog): Technical articles on algorithms and DSA
- [Interactive Demos](https://destbreso.com/interactive): Browser-based projects
- [NPM Packages](https://destbreso.com/packages): Open-source packages
- [Portfolio](https://destbreso.com/portfolio): Featured projects
- [Skills](https://destbreso.com/skills): Technical skills breakdown
## Blog Series
- [DSA Patterns](https://destbreso.com/blog/series/dsa-patterns): 27-part series
## Contact
- GitHub: https://github.com/destbreso
- LinkedIn: https://linkedin.com/in/destbreso
~2.8KB. Fits in any context window. Gives an AI agent enough information to understand the site and navigate to the right section.
The Expanded Version: /llms-full.txt
This is the same structure but with every piece of content listed individually: all relevant blog posts, all interactive demos, all npm packages, career history, skills breakdown. ~10KB. Still fits easily in modern context windows (128K+) but provides comprehensive coverage.
Design Principles
- URLs must be absolute. AI agents don't have a base URL concept like browsers.
- Descriptions should be factual. No marketing language. AI agents respond better to precise descriptions.
- Structure follows the spec. H1 title, blockquote summary, H2 sections, markdown link lists.
- Include an
## Optionalsection for low-priority content that can be skipped if context is limited.
3. /api/ai-context. The Structured JSON Endpoint
While llms.txt is designed for LLMs to read directly, the API endpoint is designed for programmatic consumption by AI agents and tools:
// app/api/ai-context/route.ts
import { NextResponse } from "next/server";
import { siteConfig } from "@/lib/seo/config";
export async function GET() {
const aiContext = {
"@context": "https://schema.org",
"@type": "WebSite",
site: {
name: "David Estévez. destbreso.com",
url: siteConfig.url,
description: siteConfig.description,
author: {
name: siteConfig.author.name,
role: "Mathematician & Software Developer",
profiles: {
github: `https://github.com/${siteConfig.author.github}`,
linkedin: `https://linkedin.com/in/${siteConfig.author.linkedin}`,
},
},
},
ai_resources: {
llms_txt: `${siteConfig.url}/llms.txt`,
llms_full_txt: `${siteConfig.url}/llms-full.txt`,
sitemap: `${siteConfig.url}/sitemap.xml`,
},
content_map: {
main_sections: [
{
name: "Blog",
url: `${siteConfig.url}/blog`,
description: "Technical articles on algorithms and DSA",
schema_type: "BlogPosting",
},
// ... more sections
],
},
expertise: {
primary: ["Algorithm Design", "Mathematical Modeling", "ML"],
languages: ["Python", "TypeScript", "C++"],
topics_covered: ["Graph algorithms", "Dynamic programming", "..."],
},
ai_instructions: {
preferred_citation: "David Estévez, destbreso.com",
recommended_reading_order: [
`${siteConfig.url}/llms-full.txt`,
`${siteConfig.url}/about`,
`${siteConfig.url}/blog`,
],
},
};
return NextResponse.json(aiContext, {
headers: {
"Cache-Control": "public, max-age=86400",
"X-Robots-Tag": "noindex", // Don't index as a page
"X-AI-Context": "true", // Signal to AI agents
"Access-Control-Allow-Origin": "*",
},
});
}
Why Both llms.txt AND an API?
Different consumers, different needs:
| llms.txt | /api/ai-context | |
|---|---|---|
| Format | Markdown | JSON |
| Consumer | LLMs reading directly | Agents, tools, automations |
| Parsing | Context window input | Programmatic processing |
| Size | Minimal (~3KB) | Medium (~5KB) |
| Standard | llmstxt.org spec | Custom (schema.org aligned) |
An AI coding assistant might prefer the JSON endpoint. ChatGPT Search might prefer the markdown file. Having both covers more ground.
4. HTML Meta Tags. Discoverability Hints
In app/layout.tsx, inside the <head>:
<!-- AI Discoverability -->
<link rel="alternate" type="text/markdown"
href="/llms.txt"
title="LLM-friendly site overview" />
<link rel="alternate" type="text/markdown"
href="/llms-full.txt"
title="LLM-friendly full site context" />
<link rel="alternate" type="application/json"
href="/api/ai-context"
title="AI Context API" />
<meta name="ai-content-declaration"
content="This site provides AI-readable content at /llms.txt" />
<meta name="citation_author" content="David Estévez" />
<meta name="citation_url" content="https://destbreso.com" />
These <link rel="alternate"> tags serve the same purpose as <link rel="alternate" type="application/rss+xml"> for RSS, they tell any agent visiting the page: "there's another version of this content available in a different format."
5. Supporting Files
.well-known/security.txt
Per RFC 9116, a standard contact point for security researchers:
Contact: mailto:[email protected]
Preferred-Languages: en, es
Canonical: https://destbreso.com/.well-known/security.txt
Expires: 2027-02-09T00:00:00.000Z
humans.txt
Per humanstxt.org, credits for the humans behind the site:
/* TEAM */
Name: David Estévez
Role: Developer & Mathematician
Site: https://destbreso.com
/* SITE */
Framework: Next.js 15 (App Router)
UI: React 18, Tailwind CSS, Framer Motion
The Complete File Map
personal-page/
├── public/
│ ├── llms.txt # Concise LLM overview (~3KB)
│ ├── llms-full.txt # Full LLM context (~10KB)
│ ├── humans.txt # Site credits
│ └── .well-known/
│ └── security.txt # Security contact
├── app/
│ ├── robots.ts # AI crawler rules
│ ├── sitemap.ts # XML sitemap
│ ├── layout.tsx # AI meta tags in <head>
│ └── api/
│ └── ai-context/
│ └── route.ts # Structured JSON endpoint
└── lib/
└── seo/
├── config.ts # Site config
└── structured-data.ts # JSON-LD generators
Trade-offs and Decisions
Static vs. Dynamic llms.txt
I chose static files in /public/. The alternative is a Next.js route handler that dynamically generates the content from the blog registry. Pros of dynamic: always in sync. Cons: more complexity, potential for build-time errors, and the content changes infrequently enough that manual updates are fine.
For larger sites with hundreds of pages, dynamic generation would be the right call.
Allow vs. Block AI Crawlers
I chose a permissive policy for search/browsing and restrictive for training. Some site owners block everything. Some allow everything. My reasoning:
- Search agents bring traffic. If ChatGPT cites my blog post, that's a visitor I wouldn't have gotten otherwise.
- Training crawlers take content. My blog posts get absorbed into model weights. I get nothing in return, no attribution, no traffic, no control.
This is a personal decision. A company selling API documentation might make different trade-offs.
JSON-LD Already Existed
My site already had comprehensive schema.org structured data (BlogPosting, Person, SoftwareApplication, Course, CreativeWork, FAQPage). This is the foundation that AI agents can use regardless of whether they read llms.txt. The new files complement, not replace, the existing structured data.
What's Next
In the final post, I'll cover how to verify this setup is actually working, tools for testing AI discoverability, and where this space is heading in 2026 and beyond.
References
- llms.txt Specification, Jeremy Howard, 2024
- OpenAI Crawlers, Official documentation
- Next.js Metadata API
- Schema.org, Structured data vocabulary
- RFC 9116 security.txt