Robots.txt Tester & Validator
Test and validate your robots.txt file to ensure search engines and bots can properly crawl your website. Check URL blocking rules instantly.
- More specific rules take precedence (longer paths win)
- If same length, Allow beats Disallow
- Default is ALLOW if no rules match
Frequently Asked Questions
Common questions about robots.txt and how to use this tool
Q. What is a robots.txt file?
A robots.txt file is a text file that tells search engine crawlers which pages or sections of your website they can or cannot access.
Q. Where should robots.txt be located?
The robots.txt file must be placed in your website’s root directory. For example: https://example.com/robots.txt
Q. Does robots.txt block pages from Google?
Robots.txt prevents crawling but doesn’t guarantee pages won’t appear in search results. Use noindex meta tag for complete blocking.
Q. How do I block AI crawlers?
Add User-agent rules for AI bots like GPTBot, ClaudeBot, followed by “Disallow: /”.
Q. What does “Disallow” mean?
The Disallow directive tells search engines not to crawl specific URLs or directories. For example, “Disallow: /admin/” prevents crawling.
Q. How do I add a sitemap?
Add “Sitemap: https://yoursite.com/sitemap.xml” at the end of your robots.txt file to help search engines find it.
Learn More About Robots.txt
Comprehensive guides to help you master robots.txt
What is Robots.txt?
Robots.txt is a text file webmasters create to instruct search engine robots how to crawl and index pages on their website.
Basic Syntax
User-agent: Googlebot
Disallow: /private/
Allow: /public/
User-agent: *
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml
Key Directives
User-agent– Specifies which crawler the rules apply toDisallow– Tells crawlers not to access specific pathsAllow– Explicitly permits crawling (useful for exceptions)Sitemap– Points to your XML sitemap location
Block All Major AI Crawlers
# Block OpenAI
User-agent: GPTBot
User-agent: ChatGPT-User
Disallow: /
# Block Anthropic (Claude)
User-agent: ClaudeBot
User-agent: anthropic-ai
Disallow: /
# Block Common Crawl
User-agent: CCBot
Disallow: /
DO:
- Always include a sitemap reference
- Block admin, login, and private areas
- Test your robots.txt before deploying
DON’T:
- Don’t use robots.txt as a security measure – it’s publicly visible
- Don’t block CSS/JS files needed for rendering
- Don’t use it to hide sensitive data
Robots.txt (Disallow)
- Prevents crawling of specific URLs
- Does NOT guarantee exclusion from search results
- Good for: Reducing crawl budget
Noindex Meta Tag
- Controls whether a page appears in search results
- Requires the page to be crawlable
- Good for: Removing pages from search results
The Golden Rule
If you want a page completely out of search results, use noindex – NOT robots.txt!