Technical

Robots.txt

Robots.txt is a plain text file at the root of a website that instructs search engine crawlers which URLs they are allowed or disallowed from accessing.

The file lives at `/robots.txt` and uses a simple directive syntax with `User-agent`, `Allow`, `Disallow`, and `Sitemap` fields. It is the first file crawlers request when visiting your domain.

Robots.txt does not prevent indexing — it prevents crawling. If other pages link to a disallowed URL, Google may still index it (showing a "No information available" snippet). Use `noindex` to prevent indexing.

In Next.js 15+, you can generate robots.txt programmatically via `app/robots.ts`. Indxel validates that your robots.txt exists, is syntactically valid, and does not accidentally block important pages.

Example

// app/robots.ts (Next.js)
import { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: { userAgent: "*", allow: "/", disallow: "/dashboard/" },
    sitemap: "https://example.com/sitemap.xml",
  };
}

Related terms

Sitemap XML

An XML sitemap is a file that lists URLs on your website along with optional metadata (last modified date, change frequency, priority) to help search engines discover and crawl your pages.

Crawl Budget

Crawl budget is the number of URLs Googlebot will crawl on your site within a given period, determined by crawl rate limit (server capacity) and crawl demand (page importance).

Noindex

Noindex is a robots meta tag directive that instructs search engines to exclude a page from their search index, preventing it from appearing in search results.

Stop shipping broken SEO

Indxel validates your metadata, guards your CI/CD pipeline, and monitors indexation — so you never miss an SEO issue again.

Get started Browse glossary