Question 1

What is a robots.txt file?

Accepted Answer

Robots.txt is a small text file at the root of your site that tells search-engine crawlers which URLs they're allowed to access. It's not a security tool — anyone can read it — and it's only a request, not a hard block. You'd use it to keep crawlers out of admin areas, faceted search URLs, or staging directories that waste crawl budget. Always test changes before deploying; one wrong 'Disallow: /' line can stop Google from crawling your entire site overnight.

Question 2

Where should robots.txt be placed?

Accepted Answer

It must sit at the root of your domain — example.com/robots.txt. Subdirectories like example.com/blog/robots.txt are ignored by crawlers. Each subdomain needs its own file: shop.example.com requires a separate robots.txt at shop.example.com/robots.txt. To verify, just open the URL in a browser; if it returns the file, you're set. If it 404s, the crawler will assume everything is allowed, which is sometimes fine and sometimes a problem depending on what you're hosting.

Question 3

Does robots.txt prevent indexing?

Accepted Answer

No — and this is one of the biggest misunderstandings in SEO. Robots.txt blocks crawling, not indexing. If another site links to your blocked URL, Google can still index it without ever fetching the page. You'll see those listings in the SERP with 'No information is available for this page'. To actually prevent indexing, use the noindex meta tag or x-robots-tag header — and make sure the page is crawlable so Google can read those instructions in the first place.

Question 4

How to test robots.txt file?

Accepted Answer

Quickest path: open Google Search Console's robots.txt report (or our Robots.txt Tester), paste your file or URL, and try test URLs to see whether each is allowed or blocked. The tool also flags syntax errors, which can break the whole file. For risky changes, test in a staging copy first, then push live. After deployment, recheck a few key URLs — your homepage, a sample product page, and the sitemap path. A 30-second test now prevents weeks of recovery later.

Question 5

What is sitemap directive in robots.txt?

Accepted Answer

The Sitemap directive is a single line that points crawlers to your XML sitemap. The format is: 'Sitemap: https://example.com/sitemap.xml'. You can include multiple Sitemap lines if you have a sitemap index or several sitemaps. It's not a replacement for submitting the sitemap in Search Console, but it helps any crawler that respects robots.txt find your sitemap automatically. Place it anywhere in the file — at the top is conventional. Use the absolute URL with the right protocol (https vs http).

Question 6

How to block Googlebot in robots.txt?

Accepted Answer

To block Googlebot from a specific folder, add: User-agent: Googlebot, then Disallow: /folder/. To block the entire site (rare and dangerous), use Disallow: /. Use 'User-agent: *' if you want the rule to apply to every crawler. After saving, run your URL through the Robots.txt Tester to confirm the block. Be careful — blocking Googlebot from a money page will pull it out of search within days. Always test the change against your top 10 URLs before deploying.

Question 7

Should i block AI bots in robots.txt?

Accepted Answer

It's a real decision now. If you don't want your content scraped to train AI models, you can block bots like GPTBot, ClaudeBot, Google-Extended, CCBot, and PerplexityBot. Add User-agent lines for each followed by Disallow: /. The trade-off — blocking Google-Extended means your content won't feed Google's AI features either, which may eventually affect AI Overviews exposure. Worth re-checking your stance every few months as the bot list keeps growing. Verify the current spelling of each agent on the bot owner's official documentation page before publishing.

Robots.txt Generator & Tester

What This Tool Does

Inputs

How It Works

Understanding the Results

⚡ Quick Presets

Global Rules (*)

Custom Rules

Testing Tool

Step-by-Step Example

Use Cases

Limitations and Notes

Frequently Asked Questions

Sources and References

Related SEO Tools

Sitemap Tool

Canonical Checker

Schema Generator

Hreflang Generator

Core Web Vitals