Robots.txt Best Practices & Common Mistakes
Master robots.txt with proven best practices and learn how to avoid costly SEO mistakes. Complete guide with real-world examples, checklists, and fix strategies.
Table of Contents
Robots.txt Best Practices
Follow these proven practices to create an effective robots.txt file that boosts your SEO performance instead of hurting it.
1. Keep It Simple
Start minimal and add rules only when needed. A complex file is harder to maintain and more error-prone.
Good Starting Point:
User-agent: * Disallow: Sitemap: https://www.example.com/sitemap.xml
This allows all bots to crawl everything while providing your sitemap location.
2. Use Comments for Clarity
Comments (lines starting with #) help you and others understand your rules and track when changes were made.
# Block admin areas - Updated 2026-02-01 User-agent: * Disallow: /admin/ Disallow: /login/ # Allow CSS and JavaScript Allow: /css/ Allow: /js/ # Sitemap location Sitemap: https://www.example.com/sitemap.xml
3. Test Before Deploying
Always test your robots.txt file before uploading it to your live site to avoid catastrophic SEO mistakes.
Testing Methods:
- • Google Search Console's robots.txt Tester
- • Online robots.txt validators
- • Local testing with curl or wget
4. Never Block Pages You Want Ranked
This seems obvious, but it's a surprisingly common mistake. Disallow only pages you truly don't want crawled.
❌ Don't Block:
- • Product pages
- • Blog posts
- • Service pages
- • Category pages
- • Important landing pages
✅ Do Block:
- • Admin areas
- • Shopping cart/checkout
- • Thank you pages
- • Duplicate content
- • Search result pages
5. Review Regularly
Set a reminder to review your robots.txt file at least quarterly or whenever you make major site changes.
Review Triggers:
- • Site redesign or migration
- • Adding new sections or features
- • Changing URL structure
- • After SEO audits
- • Quarterly scheduled reviews
Correct File Location
The robots.txt file must be in your root directory. There's no flexibility here—wrong location means search engines won't find it.
Correct Locations
- ✓ https://www.example.com/robots.txt
- ✓ https://example.com/robots.txt
- ✓ https://shop.example.com/robots.txt
Incorrect Locations
- ✗ .../folder/robots.txt
- ✗ .../wp-content/robots.txt
- ✗ .../public/robots.txt
Important Notes:
Subdomains Need Separate Files: Each subdomain requires its own robots.txt file.
- • Main site: https://example.com/robots.txt
- • Blog: https://blog.example.com/robots.txt
- • Store: https://shop.example.com/robots.txt
File Name Must Be Exact: Must be lowercase "robots.txt" - no variations allowed.
- ✓ robots.txt
- ✗ Robots.txt, ROBOTS.TXT, robot.txt
Blocking CSS/JS Mistake
This is one of the most damaging mistakes you can make. Blocking CSS and JavaScript prevents Google from properly rendering your pages.
Why It's Harmful
Google uses your CSS and JavaScript to render and evaluate your pages. Blocking these files prevents Google from:
- • Seeing your page as users see it
- • Evaluating mobile-friendliness correctly
- • Detecting hidden or cloaked content
- • Understanding page layout and structure
- • Measuring Core Web Vitals accurately
❌ The Wrong Way:
User-agent: * Disallow: /css/ Disallow: /js/ Disallow: *.css$ Disallow: *.js$
This was common advice years ago, but it's now harmful to SEO.
✅ The Right Way:
User-agent: * Allow: /css/ Allow: /js/ Allow: /*.css$ Allow: /*.js$
Or simply don't mention CSS/JS at all (they'll be crawled by default).
Google's Official Stance:
"Don't use robots.txt to block CSS, JavaScript, or images."
Sitemap Inclusion
Including your sitemap in robots.txt is a best practice that helps search engines discover your content faster.
Proper Syntax:
Sitemap: https://www.example.com/sitemap.xml
✅ Best Practices:
- 1. Use Absolute URLs
- 2. Include Protocol
- 3. Add All Sitemaps
Sitemap: https://example.com/sitemap-images.xml
❌ Common Mistakes:
- 1. Relative URLs
- 2. Missing Protocol
- 3. Wrong Case
Case Sensitivity
Understanding case sensitivity rules prevents confusing errors in your robots.txt file.
Directives Are Case-Insensitive
These are all equivalent:
However: Use standard capitalization (User-agent, Disallow, Allow, Sitemap) for readability.
URLs Are Case-Sensitive
These are DIFFERENT:
Match your URL structure exactly. If your site uses lowercase URLs, use lowercase in robots.txt.
Real-World Examples
Learn from these real-world robots.txt examples optimized for different types of websites.
Example 1: Blog/Content Site
# Blog robots.txt User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php # Block search results Disallow: /?s= Disallow: /search/ # Block author archives if thin content Disallow: /author/ # Allow important resources Allow: /wp-content/uploads/ Sitemap: https://www.blog.com/sitemap.xml
Example 2: E-commerce Store
# E-commerce robots.txt User-agent: * # Block customer account areas Disallow: /account/ Disallow: /checkout/ Disallow: /cart/ Disallow: /wishlist/ # Block duplicate content from filters Disallow: /*?orderby= Disallow: /*?filter_ Disallow: /*?currency= # Block tracking parameters Disallow: /*?utm_ Disallow: /*?ref= # Allow product images Allow: /wp-content/uploads/ Sitemap: https://www.store.com/product-sitemap.xml Sitemap: https://www.store.com/category-sitemap.xml
Example 3: Corporate Website
# Corporate robots.txt User-agent: * # Block internal tools Disallow: /intranet/ Disallow: /employee/ Disallow: /internal/ # Block unnecessary pages Disallow: /thankyou/ Disallow: /confirmation/ # Block test and staging Disallow: /test/ Disallow: /staging/ Sitemap: https://www.company.com/sitemap.xml
Implementation Checklist
Use this comprehensive checklist before deploying your robots.txt file to avoid SEO disasters.
Pre-Deployment Checklist
- File named exactly "robots.txt" (lowercase)
- Located in root directory
- No accidental blocking of important pages
- CSS and JavaScript are allowed
- Sitemap URL included (with https://)
- No syntax errors or typos
- File tested in Google Search Console
- Tested with important URLs
Post-Deployment Checklist
- File accessible at yourdomain.com/robots.txt
- Returns 200 status code (not 404)
- Content displays correctly
- Submitted to Google Search Console
- No crawl errors after 48 hours
- Important pages still indexed
- Calendar reminder set for quarterly review
Ready to Create a Perfect robots.txt?
Use our free robots.txt Generator with built-in validation and best practices