Robots.txt Generator & Tester
Control how search engines crawl your site.
What This Tool Does
The Robots.txt Generator and Tester helps you create robots.txt files with proper crawl directives and test whether specific URLs would be blocked or allowed by your rules. It supports common user agents including Googlebot, Bingbot, GPTBot, ClaudeBot, and custom agents.
Inputs
- Generator mode: Select user agents, set Allow and Disallow paths, add Sitemap URLs, and generate a complete robots.txt file.
- Tester mode: Paste an existing robots.txt file and test specific URLs against the rules to see if they would be crawled or blocked.
How It Works
In generator mode, you configure rules through a form interface and the tool outputs valid robots.txt syntax. In tester mode, the tool parses your robots.txt using the standard Robots Exclusion Protocol matching rules, evaluating each URL against the most specific matching rule for the selected user agent.
Understanding the Results
- Generated output: A complete robots.txt file ready to copy and deploy to your server root.
- Test results: Clear Allow or Disallow status for each tested URL, showing which rule matched.
⚡ Quick Presets
Select common patterns to auto-fill rules.
Global Rules (*)
Custom Rules
Testing Tool
Check if a URL is blocked by the rules above.
Step-by-Step Example
- In Generator mode, select the user agents you want to target (such as * for all bots).
- Add Disallow rules for paths you want to block, such as /admin/ or /private/.
- Add Allow rules for any exceptions within blocked directories.
- Add your Sitemap URL so crawlers can discover all your pages.
- Click Generate to produce the robots.txt output.
- Switch to Tester mode, paste the generated file, and test sample URLs to verify the rules work as expected.
Use Cases
- Creating a robots.txt file for a new website before launch.
- Testing whether staging or admin URLs are properly blocked from crawlers.
- Adding AI bot directives to control content usage by AI training crawlers.
- Verifying robots.txt rules after a site restructure or URL migration.
- Debugging unexpected crawling or indexing issues.
Limitations and Notes
- Robots.txt controls crawling, not indexing. Use noindex tags to prevent indexing.
- Not all bots respect robots.txt. Malicious crawlers may ignore the file entirely.
- Overly broad Disallow rules can accidentally block important content from search engines.
- The tester uses client-side parsing and may have minor differences from how specific crawlers interpret edge cases.
Frequently Asked Questions
What is a robots.txt file?
Robots.txt is a small text file at the root of your site that tells search-engine crawlers which URLs they're allowed to access. It's not a security tool — anyone can read it — and it's only a request, not a hard block. You'd use it to keep crawlers out of admin areas, faceted search URLs, or staging directories that waste crawl budget. Always test changes before deploying; one wrong 'Disallow: /' line can stop Google from crawling your entire site overnight.
Where should robots.txt be placed?
It must sit at the root of your domain — example.com/robots.txt. Subdirectories like example.com/blog/robots.txt are ignored by crawlers. Each subdomain needs its own file: shop.example.com requires a separate robots.txt at shop.example.com/robots.txt. To verify, just open the URL in a browser; if it returns the file, you're set. If it 404s, the crawler will assume everything is allowed, which is sometimes fine and sometimes a problem depending on what you're hosting.
Does robots.txt prevent indexing?
No — and this is one of the biggest misunderstandings in SEO. Robots.txt blocks crawling, not indexing. If another site links to your blocked URL, Google can still index it without ever fetching the page. You'll see those listings in the SERP with 'No information is available for this page'. To actually prevent indexing, use the noindex meta tag or x-robots-tag header — and make sure the page is crawlable so Google can read those instructions in the first place.
How to test robots.txt file?
Quickest path: open Google Search Console's robots.txt report (or our Robots.txt Tester), paste your file or URL, and try test URLs to see whether each is allowed or blocked. The tool also flags syntax errors, which can break the whole file. For risky changes, test in a staging copy first, then push live. After deployment, recheck a few key URLs — your homepage, a sample product page, and the sitemap path. A 30-second test now prevents weeks of recovery later.
What is sitemap directive in robots.txt?
The Sitemap directive is a single line that points crawlers to your XML sitemap. The format is: 'Sitemap: https://example.com/sitemap.xml'. You can include multiple Sitemap lines if you have a sitemap index or several sitemaps. It's not a replacement for submitting the sitemap in Search Console, but it helps any crawler that respects robots.txt find your sitemap automatically. Place it anywhere in the file — at the top is conventional. Use the absolute URL with the right protocol (https vs http).
How to block Googlebot in robots.txt?
To block Googlebot from a specific folder, add: User-agent: Googlebot, then Disallow: /folder/. To block the entire site (rare and dangerous), use Disallow: /. Use 'User-agent: *' if you want the rule to apply to every crawler. After saving, run your URL through the Robots.txt Tester to confirm the block. Be careful — blocking Googlebot from a money page will pull it out of search within days. Always test the change against your top 10 URLs before deploying.
Should i block AI bots in robots.txt?
It's a real decision now. If you don't want your content scraped to train AI models, you can block bots like GPTBot, ClaudeBot, Google-Extended, CCBot, and PerplexityBot. Add User-agent lines for each followed by Disallow: /. The trade-off — blocking Google-Extended means your content won't feed Google's AI features either, which may eventually affect AI Overviews exposure. Worth re-checking your stance every few months as the bot list keeps growing. Verify the current spelling of each agent on the bot owner's official documentation page before publishing.
Sources and References
- Google Search Central - Robots.txt: developers.google.com
- RFC 9309 - Robots Exclusion Protocol: datatracker.ietf.org
- Bing Webmaster - Robots.txt: bing.com
- Google - Robots.txt Tester: developers.google.com
- web.dev - Crawl accessibility: web.dev