How to Add robots.txt File to Your Next.js App

A robots.txt file define rules for search engine crawlers to follow before crawling your website. It's used to avoid excess requests from crawlers and not used to prevent crawlers from indexing your content. All search engines use their own unique crawlers and in this file you can specify rules for crawlers on your website. In this blog post we will learn how to add a robots.txt file to your Next.js App Router app.

Jordan Wu

4 min·Posted Mar 20, 2024

Table of Contents

Why Do You Need robots.txt?
Limitations of a robots.txt File
What Is a robots.txt File?
Create robots.txt File in Next.js App Router
How to Test Your robots.txt File

Why Do You Need robots.txt?

A robots.txt file is mainly used to tell search engine crawlers which URLs the crawler can access on your website. How it works is in the file you define rules for crawlers to follow when they are on your website. The rules include both allow URLs and disallow URLs. Each URL can be the following file types: a web page, a media file, or a resource file.

When the URL is a web page. You can block crawlers from crawling your webpage and indexing its content. If your webpage is blocked with a robots.txt, its URL can still appear in search results without a description. Image files, video files, PDFs, and other non-HTML files embedded in the blocked page will be excluded from crawling, too, unless they're referenced by other pages that are allowed for crawling.

When the URL is a media file. You can prevent image, video, and audio files from appearing in Google search results. It will not prevent other webpages from linking to your image, video, or audio file.

When the URL is a resource file. You may block resource files like script or style files. It is not recommended if the absence of these resources make the page harder for crawlers to understand the webpage, don't block them.

Limitations of a robots.txt File

Keep in mind that not all robots.txt rules are supported by all search engines. It's up to the search engine crawler to obey the rules and most do. If you want to keep confidential or private content secure from web crawlers the better blocking method would be to use authentication, you need to password protect it to ensure only authorized users can access it.

A webpage that is disallowed in robots.txt can still be indexed if linked from other site. To prevent your URL from appearing in Google search results you can use authentication or use noindex meta.

What Is a robots.txt File?

A robots.txt file is a simple text file that contains rules about which crawlers may access which part of your website. Crawlers who follow the Robots Exclusion Protocol (REP) will download and parse the website's robots.txt file to understand the rules set for the crawler before crawling. You must place the robots.txt at the top-level directory of a website and has a maximum file size of 500 kibibytes KiB.

Valid robots.txt lines consist of a field, a colon, and a value.

User-agent: Googlebot
Disallow: /nogooglebot/

User-agent: *
Allow: /

Sitemap: https://www.example.com/sitemap.xml

The user-agent line identifies which crawler the rules apply to and is case-insensitive. Using an asterisk (*) matches all crawlers.

The disallow rule specifies paths that must not be accessed by the crawlers identified by the user-agent and is case-sensitive.

The allow rule specifies paths that may be accessed by the crawlers identified by the user-agent and is case-sensitive.

The sitemap rule specifies the absolute URL to your website's sitemap and is case-sensitive.

Create robots.txt File in Next.js App Router

Add a robots.(js|ts) file to your app directory, for example app/robots.ts path.

app/robots.ts

import type { MetadataRoute } from 'next'

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: 'Googlebot',
        allow: ['/'],
        disallow: '/private/',
      },
      {
        userAgent: ['Applebot', 'Bingbot'],
        disallow: ['/'],
      },
    ],
    sitemap: 'https://acme.com/sitemap.xml',
  }
}

This will output the following:

User-Agent: Googlebot
Allow: /
Disallow: /private/

User-Agent: Applebot
Disallow: /

User-Agent: Bingbot
Disallow: /

Sitemap: https://acme.com/sitemap.xml

Once you save your robots.ts file, you can check your changes at localhost:3000/robots.txt. Once it looks good you can deploy your changes to production so it's live on your website.

How to Test Your robots.txt File

To test whether your newly uploaded robots.txt file is publicly accessible, open a private browsing window (or equivalent) in your browser and navigate to the location of the robots.txt file. If you see the contents of your robots.txt you can test it using the robots.txt report in Google Search Console.