David Estévez

Implementing AI SEO on a Next.js Site: A Practical Guide

Want to LearnDev Insights

March 29, 202612 min read

AISEONext.jsllms.txtrobots.txtJSON-LDSchema.orgTutorial

Implementing AI SEO on a Next.js Site. A Practical Guide

In the previous post, I argued that websites now have a new audience (AI agents) and that most sites aren't prepared for them. In this post, I'll walk through everything I actually built to address this on my personal site, a Next.js application at destbreso.com.

This isn't theoretical. Every code snippet here is running in production. I'll explain the decisions behind each piece and the trade-offs involved.

The Architecture

The implementation has five components, each addressing a different part of the AI discoverability problem:

┌─────────────────────────────────────────────┐
│               destbreso.com                  │
├──────────┬──────────┬──────────┬────────────┤
│ robots.ts│ llms.txt │ ai-ctx   │ meta tags  │
│ (control)│ (curated)│ (struct) │ (hints)    │
└──────────┴──────────┴──────────┴────────────┘

Let's build each one.

1. `robots.txt`. Granular AI Crawler Control

The first layer is policy. Next.js lets you generate robots.txt dynamically via app/robots.ts:

import type { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      // Standard search engines: business as usual
      {
        userAgent: "*",
        allow: "/",
        disallow: ["/api/", "/dashboard/", "/maintenance/"],
      },
      // OpenAI search: allow (appear in ChatGPT search results)
      {
        userAgent: "OAI-SearchBot",
        allow: ["/", "/llms.txt", "/llms-full.txt"],
        disallow: ["/api/", "/dashboard/"],
      },
      // OpenAI training: disallow (no benefit to site owner)
      {
        userAgent: "GPTBot",
        disallow: ["/"],
      },
      // ChatGPT user-initiated browsing
      {
        userAgent: "ChatGPT-User",
        allow: ["/"],
      },
      // Google Gemini / AI features
      {
        userAgent: "Google-Extended",
        allow: ["/"],
      },
      // Anthropic
      {
        userAgent: "anthropic-ai",
        allow: ["/"],
      },
      // Perplexity
      {
        userAgent: "PerplexityBot",
        allow: ["/"],
      },
      // Common Crawl (training datasets): disallow
      {
        userAgent: "CCBot",
        disallow: ["/"],
      },
    ],
    sitemap: `https://destbreso.com/sitemap.xml`,
  };
}

The Key Decision: Search vs. Training

This is the most important distinction in AI robots.txt policy:

OAI-SearchBot → Surfaces your site in ChatGPT search results. Allow. This is essentially the AI equivalent of Googlebot. If you block it, your site won't appear when users search via ChatGPT.
GPTBot → Crawls content for OpenAI's model training. Disallow. There's no direct benefit to you. Your content gets absorbed into the model's weights with no attribution.
ChatGPT-User → Triggered by a user action (e.g., "browse this URL"). Since it's user-initiated, robots.txt is advisory, but it's still good practice to allow it explicitly.

OpenAI's documentation states these are independent, you can allow search while blocking training, or vice versa.

2. `llms.txt`. The AI Entry Point

This is the heart of the implementation. The llms.txt specification proposes a markdown file at /llms.txt that provides LLM-friendly content.

Why Markdown?

LLMs understand markdown natively. It's their preferred input format. HTML requires parsing, stripping navigation, handling JavaScript. Markdown is clean, structured text that fits directly into a context window.

The Concise Version: `/llms.txt`

# David Estévez. destbreso.com

> Personal website of David Estévez, a mathematician and software developer.
> Specializing in mathematical modeling, algorithm design, and clean software.

## Site Structure

- [Blog](https://destbreso.com/blog): Technical articles on algorithms and DSA
- [Interactive Demos](https://destbreso.com/interactive): Browser-based projects
- [NPM Packages](https://destbreso.com/packages): Open-source packages
- [Portfolio](https://destbreso.com/portfolio): Featured projects
- [Skills](https://destbreso.com/skills): Technical skills breakdown

## Blog Series

- [DSA Patterns](https://destbreso.com/blog/series/dsa-patterns): 27-part series

## Contact

- GitHub: https://github.com/destbreso
- LinkedIn: https://linkedin.com/in/destbreso

~2.8KB. Fits in any context window. Gives an AI agent enough information to understand the site and navigate to the right section.

The Expanded Version: `/llms-full.txt`

This is the same structure but with every piece of content listed individually: all relevant blog posts, all interactive demos, all npm packages, career history, skills breakdown. ~10KB. Still fits easily in modern context windows (128K+) but provides comprehensive coverage.

Design Principles

URLs must be absolute. AI agents don't have a base URL concept like browsers.
Descriptions should be factual. No marketing language. AI agents respond better to precise descriptions.
Structure follows the spec. H1 title, blockquote summary, H2 sections, markdown link lists.
Include an ## Optional section for low-priority content that can be skipped if context is limited.

3. `/api/ai-context`. The Structured JSON Endpoint

While llms.txt is designed for LLMs to read directly, the API endpoint is designed for programmatic consumption by AI agents and tools:

// app/api/ai-context/route.ts
import { NextResponse } from "next/server";
import { siteConfig } from "@/lib/seo/config";

export async function GET() {
  const aiContext = {
    "@context": "https://schema.org",
    "@type": "WebSite",

    site: {
      name: "David Estévez. destbreso.com",
      url: siteConfig.url,
      description: siteConfig.description,
      author: {
        name: siteConfig.author.name,
        role: "Mathematician & Software Developer",
        profiles: {
          github: `https://github.com/${siteConfig.author.github}`,
          linkedin: `https://linkedin.com/in/${siteConfig.author.linkedin}`,
        },
      },
    },

    ai_resources: {
      llms_txt: `${siteConfig.url}/llms.txt`,
      llms_full_txt: `${siteConfig.url}/llms-full.txt`,
      sitemap: `${siteConfig.url}/sitemap.xml`,
    },

    content_map: {
      main_sections: [
        {
          name: "Blog",
          url: `${siteConfig.url}/blog`,
          description: "Technical articles on algorithms and DSA",
          schema_type: "BlogPosting",
        },
        // ... more sections
      ],
    },

    expertise: {
      primary: ["Algorithm Design", "Mathematical Modeling", "ML"],
      languages: ["Python", "TypeScript", "C++"],
      topics_covered: ["Graph algorithms", "Dynamic programming", "..."],
    },

    ai_instructions: {
      preferred_citation: "David Estévez, destbreso.com",
      recommended_reading_order: [
        `${siteConfig.url}/llms-full.txt`,
        `${siteConfig.url}/about`,
        `${siteConfig.url}/blog`,
      ],
    },
  };

  return NextResponse.json(aiContext, {
    headers: {
      "Cache-Control": "public, max-age=86400",
      "X-Robots-Tag": "noindex",     // Don't index as a page
      "X-AI-Context": "true",         // Signal to AI agents
      "Access-Control-Allow-Origin": "*",
    },
  });
}

Why Both llms.txt AND an API?

Different consumers, different needs:

	llms.txt	/api/ai-context
Format	Markdown	JSON
Consumer	LLMs reading directly	Agents, tools, automations
Parsing	Context window input	Programmatic processing
Size	Minimal (~3KB)	Medium (~5KB)
Standard	llmstxt.org spec	Custom (schema.org aligned)

An AI coding assistant might prefer the JSON endpoint. ChatGPT Search might prefer the markdown file. Having both covers more ground.

4. HTML Meta Tags. Discoverability Hints

In app/layout.tsx, inside the <head>:

<!-- AI Discoverability -->
<link rel="alternate" type="text/markdown"
      href="/llms.txt"
      title="LLM-friendly site overview" />

<link rel="alternate" type="text/markdown"
      href="/llms-full.txt"
      title="LLM-friendly full site context" />

<link rel="alternate" type="application/json"
      href="/api/ai-context"
      title="AI Context API" />

<meta name="ai-content-declaration"
      content="This site provides AI-readable content at /llms.txt" />

<meta name="citation_author" content="David Estévez" />
<meta name="citation_url" content="https://destbreso.com" />

These <link rel="alternate"> tags serve the same purpose as <link rel="alternate" type="application/rss+xml"> for RSS, they tell any agent visiting the page: "there's another version of this content available in a different format."

5. Supporting Files

`.well-known/security.txt`

Per RFC 9116, a standard contact point for security researchers:

Contact: mailto:[email protected]
Preferred-Languages: en, es
Canonical: https://destbreso.com/.well-known/security.txt
Expires: 2027-02-09T00:00:00.000Z

`humans.txt`

Per humanstxt.org, credits for the humans behind the site:

/* TEAM */
Name: David Estévez
Role: Developer & Mathematician
Site: https://destbreso.com

/* SITE */
Framework: Next.js 15 (App Router)
UI: React 18, Tailwind CSS, Framer Motion

The Complete File Map

personal-page/
├── public/
│   ├── llms.txt              # Concise LLM overview (~3KB)
│   ├── llms-full.txt         # Full LLM context (~10KB)
│   ├── humans.txt            # Site credits
│   └── .well-known/
│       └── security.txt      # Security contact
├── app/
│   ├── robots.ts             # AI crawler rules
│   ├── sitemap.ts            # XML sitemap
│   ├── layout.tsx            # AI meta tags in <head>
│   └── api/
│       └── ai-context/
│           └── route.ts      # Structured JSON endpoint
└── lib/
    └── seo/
        ├── config.ts          # Site config
        └── structured-data.ts # JSON-LD generators

Trade-offs and Decisions

Static vs. Dynamic llms.txt

I chose static files in /public/. The alternative is a Next.js route handler that dynamically generates the content from the blog registry. Pros of dynamic: always in sync. Cons: more complexity, potential for build-time errors, and the content changes infrequently enough that manual updates are fine.

For larger sites with hundreds of pages, dynamic generation would be the right call.

Allow vs. Block AI Crawlers

I chose a permissive policy for search/browsing and restrictive for training. Some site owners block everything. Some allow everything. My reasoning:

Search agents bring traffic. If ChatGPT cites my blog post, that's a visitor I wouldn't have gotten otherwise.
Training crawlers take content. My blog posts get absorbed into model weights. I get nothing in return, no attribution, no traffic, no control.

This is a personal decision. A company selling API documentation might make different trade-offs.

JSON-LD Already Existed

My site already had comprehensive schema.org structured data (BlogPosting, Person, SoftwareApplication, Course, CreativeWork, FAQPage). This is the foundation that AI agents can use regardless of whether they read llms.txt. The new files complement, not replace, the existing structured data.

What's Next

In the final post, I'll cover how to verify this setup is actually working, tools for testing AI discoverability, and where this space is heading in 2026 and beyond.

References

llms.txt Specification, Jeremy Howard, 2024
OpenAI Crawlers, Official documentation
Next.js Metadata API
Schema.org, Structured data vocabulary
RFC 9116 security.txt

Implementing AI SEO on a Next.js Site: A Practical Guide

Want to LearnDev Insights

March 29, 202612 min read

AISEONext.jsllms.txtrobots.txtJSON-LDSchema.orgTutorial

Implementing AI SEO on a Next.js Site. A Practical Guide

This isn't theoretical. Every code snippet here is running in production. I'll explain the decisions behind each piece and the trade-offs involved.

The Architecture

The implementation has five components, each addressing a different part of the AI discoverability problem:

┌─────────────────────────────────────────────┐
│               destbreso.com                  │
├──────────┬──────────┬──────────┬────────────┤
│ robots.ts│ llms.txt │ ai-ctx   │ meta tags  │
│ (control)│ (curated)│ (struct) │ (hints)    │
└──────────┴──────────┴──────────┴────────────┘

Let's build each one.

1. `robots.txt`. Granular AI Crawler Control

The first layer is policy. Next.js lets you generate robots.txt dynamically via app/robots.ts:

import type { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      // Standard search engines: business as usual
      {
        userAgent: "*",
        allow: "/",
        disallow: ["/api/", "/dashboard/", "/maintenance/"],
      },
      // OpenAI search: allow (appear in ChatGPT search results)
      {
        userAgent: "OAI-SearchBot",
        allow: ["/", "/llms.txt", "/llms-full.txt"],
        disallow: ["/api/", "/dashboard/"],
      },
      // OpenAI training: disallow (no benefit to site owner)
      {
        userAgent: "GPTBot",
        disallow: ["/"],
      },
      // ChatGPT user-initiated browsing
      {
        userAgent: "ChatGPT-User",
        allow: ["/"],
      },
      // Google Gemini / AI features
      {
        userAgent: "Google-Extended",
        allow: ["/"],
      },
      // Anthropic
      {
        userAgent: "anthropic-ai",
        allow: ["/"],
      },
      // Perplexity
      {
        userAgent: "PerplexityBot",
        allow: ["/"],
      },
      // Common Crawl (training datasets): disallow
      {
        userAgent: "CCBot",
        disallow: ["/"],
      },
    ],
    sitemap: `https://destbreso.com/sitemap.xml`,
  };
}

The Key Decision: Search vs. Training

This is the most important distinction in AI robots.txt policy:

OAI-SearchBot → Surfaces your site in ChatGPT search results. Allow. This is essentially the AI equivalent of Googlebot. If you block it, your site won't appear when users search via ChatGPT.
GPTBot → Crawls content for OpenAI's model training. Disallow. There's no direct benefit to you. Your content gets absorbed into the model's weights with no attribution.
ChatGPT-User → Triggered by a user action (e.g., "browse this URL"). Since it's user-initiated, robots.txt is advisory, but it's still good practice to allow it explicitly.

OpenAI's documentation states these are independent, you can allow search while blocking training, or vice versa.

2. `llms.txt`. The AI Entry Point

This is the heart of the implementation. The llms.txt specification proposes a markdown file at /llms.txt that provides LLM-friendly content.

Why Markdown?

The Concise Version: `/llms.txt`

# David Estévez. destbreso.com

> Personal website of David Estévez, a mathematician and software developer.
> Specializing in mathematical modeling, algorithm design, and clean software.

## Site Structure

- [Blog](https://destbreso.com/blog): Technical articles on algorithms and DSA
- [Interactive Demos](https://destbreso.com/interactive): Browser-based projects
- [NPM Packages](https://destbreso.com/packages): Open-source packages
- [Portfolio](https://destbreso.com/portfolio): Featured projects
- [Skills](https://destbreso.com/skills): Technical skills breakdown

## Blog Series

- [DSA Patterns](https://destbreso.com/blog/series/dsa-patterns): 27-part series

## Contact

- GitHub: https://github.com/destbreso
- LinkedIn: https://linkedin.com/in/destbreso

~2.8KB. Fits in any context window. Gives an AI agent enough information to understand the site and navigate to the right section.

The Expanded Version: `/llms-full.txt`

Design Principles

URLs must be absolute. AI agents don't have a base URL concept like browsers.
Descriptions should be factual. No marketing language. AI agents respond better to precise descriptions.
Structure follows the spec. H1 title, blockquote summary, H2 sections, markdown link lists.
Include an ## Optional section for low-priority content that can be skipped if context is limited.

3. `/api/ai-context`. The Structured JSON Endpoint

While llms.txt is designed for LLMs to read directly, the API endpoint is designed for programmatic consumption by AI agents and tools:

// app/api/ai-context/route.ts
import { NextResponse } from "next/server";
import { siteConfig } from "@/lib/seo/config";

export async function GET() {
  const aiContext = {
    "@context": "https://schema.org",
    "@type": "WebSite",

    site: {
      name: "David Estévez. destbreso.com",
      url: siteConfig.url,
      description: siteConfig.description,
      author: {
        name: siteConfig.author.name,
        role: "Mathematician & Software Developer",
        profiles: {
          github: `https://github.com/${siteConfig.author.github}`,
          linkedin: `https://linkedin.com/in/${siteConfig.author.linkedin}`,
        },
      },
    },

    ai_resources: {
      llms_txt: `${siteConfig.url}/llms.txt`,
      llms_full_txt: `${siteConfig.url}/llms-full.txt`,
      sitemap: `${siteConfig.url}/sitemap.xml`,
    },

    content_map: {
      main_sections: [
        {
          name: "Blog",
          url: `${siteConfig.url}/blog`,
          description: "Technical articles on algorithms and DSA",
          schema_type: "BlogPosting",
        },
        // ... more sections
      ],
    },

    expertise: {
      primary: ["Algorithm Design", "Mathematical Modeling", "ML"],
      languages: ["Python", "TypeScript", "C++"],
      topics_covered: ["Graph algorithms", "Dynamic programming", "..."],
    },

    ai_instructions: {
      preferred_citation: "David Estévez, destbreso.com",
      recommended_reading_order: [
        `${siteConfig.url}/llms-full.txt`,
        `${siteConfig.url}/about`,
        `${siteConfig.url}/blog`,
      ],
    },
  };

  return NextResponse.json(aiContext, {
    headers: {
      "Cache-Control": "public, max-age=86400",
      "X-Robots-Tag": "noindex",     // Don't index as a page
      "X-AI-Context": "true",         // Signal to AI agents
      "Access-Control-Allow-Origin": "*",
    },
  });
}

Why Both llms.txt AND an API?

Different consumers, different needs:

	llms.txt	/api/ai-context
Format	Markdown	JSON
Consumer	LLMs reading directly	Agents, tools, automations
Parsing	Context window input	Programmatic processing
Size	Minimal (~3KB)	Medium (~5KB)
Standard	llmstxt.org spec	Custom (schema.org aligned)

An AI coding assistant might prefer the JSON endpoint. ChatGPT Search might prefer the markdown file. Having both covers more ground.

4. HTML Meta Tags. Discoverability Hints

In app/layout.tsx, inside the <head>:

<!-- AI Discoverability -->
<link rel="alternate" type="text/markdown"
      href="/llms.txt"
      title="LLM-friendly site overview" />

<link rel="alternate" type="text/markdown"
      href="/llms-full.txt"
      title="LLM-friendly full site context" />

<link rel="alternate" type="application/json"
      href="/api/ai-context"
      title="AI Context API" />

<meta name="ai-content-declaration"
      content="This site provides AI-readable content at /llms.txt" />

<meta name="citation_author" content="David Estévez" />
<meta name="citation_url" content="https://destbreso.com" />

5. Supporting Files

`.well-known/security.txt`

Per RFC 9116, a standard contact point for security researchers:

Contact: mailto:[email protected]
Preferred-Languages: en, es
Canonical: https://destbreso.com/.well-known/security.txt
Expires: 2027-02-09T00:00:00.000Z

`humans.txt`

Per humanstxt.org, credits for the humans behind the site:

/* TEAM */
Name: David Estévez
Role: Developer & Mathematician
Site: https://destbreso.com

/* SITE */
Framework: Next.js 15 (App Router)
UI: React 18, Tailwind CSS, Framer Motion

The Complete File Map

personal-page/
├── public/
│   ├── llms.txt              # Concise LLM overview (~3KB)
│   ├── llms-full.txt         # Full LLM context (~10KB)
│   ├── humans.txt            # Site credits
│   └── .well-known/
│       └── security.txt      # Security contact
├── app/
│   ├── robots.ts             # AI crawler rules
│   ├── sitemap.ts            # XML sitemap
│   ├── layout.tsx            # AI meta tags in <head>
│   └── api/
│       └── ai-context/
│           └── route.ts      # Structured JSON endpoint
└── lib/
    └── seo/
        ├── config.ts          # Site config
        └── structured-data.ts # JSON-LD generators

Trade-offs and Decisions

Static vs. Dynamic llms.txt

For larger sites with hundreds of pages, dynamic generation would be the right call.

Allow vs. Block AI Crawlers

I chose a permissive policy for search/browsing and restrictive for training. Some site owners block everything. Some allow everything. My reasoning:

Search agents bring traffic. If ChatGPT cites my blog post, that's a visitor I wouldn't have gotten otherwise.
Training crawlers take content. My blog posts get absorbed into model weights. I get nothing in return, no attribution, no traffic, no control.

This is a personal decision. A company selling API documentation might make different trade-offs.

JSON-LD Already Existed

What's Next

In the final post, I'll cover how to verify this setup is actually working, tools for testing AI discoverability, and where this space is heading in 2026 and beyond.

References

llms.txt Specification, Jeremy Howard, 2024
OpenAI Crawlers, Official documentation
Next.js Metadata API
Schema.org, Structured data vocabulary
RFC 9116 security.txt

Implementing AI SEO on a Next.js Site. A Practical Guide

The Architecture

1. robots.txt. Granular AI Crawler Control

The Key Decision: Search vs. Training

2. llms.txt. The AI Entry Point

Why Markdown?

The Concise Version: /llms.txt

The Expanded Version: /llms-full.txt

Design Principles

3. /api/ai-context. The Structured JSON Endpoint

Why Both llms.txt AND an API?

4. HTML Meta Tags. Discoverability Hints

5. Supporting Files

.well-known/security.txt

humans.txt

The Complete File Map

Trade-offs and Decisions

Static vs. Dynamic llms.txt

Allow vs. Block AI Crawlers

JSON-LD Already Existed

What's Next

References

Your Website Has a New Audience. And It's Not Human

Testing AI SEO and What Comes Next

Related Posts

Your Website Has a New Audience. And It's Not Human

Testing AI SEO and What Comes Next

Implementing AI SEO on a Next.js Site. A Practical Guide

The Architecture

1. robots.txt. Granular AI Crawler Control

The Key Decision: Search vs. Training

2. llms.txt. The AI Entry Point

Why Markdown?

The Concise Version: /llms.txt

The Expanded Version: /llms-full.txt

Design Principles

3. /api/ai-context. The Structured JSON Endpoint

Why Both llms.txt AND an API?

4. HTML Meta Tags. Discoverability Hints

5. Supporting Files

.well-known/security.txt

humans.txt

The Complete File Map

Trade-offs and Decisions

Static vs. Dynamic llms.txt

Allow vs. Block AI Crawlers

JSON-LD Already Existed

What's Next

References

Your Website Has a New Audience. And It's Not Human

Testing AI SEO and What Comes Next

Related Posts

Your Website Has a New Audience. And It's Not Human

Testing AI SEO and What Comes Next

1. `robots.txt`. Granular AI Crawler Control

2. `llms.txt`. The AI Entry Point

The Concise Version: `/llms.txt`

The Expanded Version: `/llms-full.txt`

3. `/api/ai-context`. The Structured JSON Endpoint

`.well-known/security.txt`

`humans.txt`

1. `robots.txt`. Granular AI Crawler Control

2. `llms.txt`. The AI Entry Point

The Concise Version: `/llms.txt`

The Expanded Version: `/llms-full.txt`

3. `/api/ai-context`. The Structured JSON Endpoint

`.well-known/security.txt`

`humans.txt`