Robots.txt Best Practices & Common Mistakes

7 min readSEO Expert Team

Master robots.txt with proven best practices and learn how to avoid costly SEO mistakes. Complete guide with real-world examples, checklists, and fix strategies.

Robots.txt Best Practices

Follow these proven practices to create an effective robots.txt file that boosts your SEO performance instead of hurting it.

1. Keep It Simple

Start minimal and add rules only when needed. A complex file is harder to maintain and more error-prone.

Good Starting Point:

User-agent: *
Disallow:

Sitemap: https://www.example.com/sitemap.xml

This allows all bots to crawl everything while providing your sitemap location.

2. Use Comments for Clarity

Comments (lines starting with #) help you and others understand your rules and track when changes were made.

# Block admin areas - Updated 2026-02-01
User-agent: *
Disallow: /admin/
Disallow: /login/

# Allow CSS and JavaScript
Allow: /css/
Allow: /js/

# Sitemap location
Sitemap: https://www.example.com/sitemap.xml

3. Test Before Deploying

Always test your robots.txt file before uploading it to your live site to avoid catastrophic SEO mistakes.

Testing Methods:

  • • Google Search Console's robots.txt Tester
  • • Online robots.txt validators
  • • Local testing with curl or wget

4. Never Block Pages You Want Ranked

This seems obvious, but it's a surprisingly common mistake. Disallow only pages you truly don't want crawled.

❌ Don't Block:

  • • Product pages
  • • Blog posts
  • • Service pages
  • • Category pages
  • • Important landing pages

✅ Do Block:

  • • Admin areas
  • • Shopping cart/checkout
  • • Thank you pages
  • • Duplicate content
  • • Search result pages

5. Review Regularly

Set a reminder to review your robots.txt file at least quarterly or whenever you make major site changes.

Review Triggers:

  • • Site redesign or migration
  • • Adding new sections or features
  • • Changing URL structure
  • • After SEO audits
  • • Quarterly scheduled reviews

Correct File Location

The robots.txt file must be in your root directory. There's no flexibility here—wrong location means search engines won't find it.

Correct Locations

  • ✓ https://www.example.com/robots.txt
  • ✓ https://example.com/robots.txt
  • ✓ https://shop.example.com/robots.txt

Incorrect Locations

  • ✗ .../folder/robots.txt
  • ✗ .../wp-content/robots.txt
  • ✗ .../public/robots.txt

Important Notes:

Subdomains Need Separate Files: Each subdomain requires its own robots.txt file.

  • • Main site: https://example.com/robots.txt
  • • Blog: https://blog.example.com/robots.txt
  • • Store: https://shop.example.com/robots.txt

File Name Must Be Exact: Must be lowercase "robots.txt" - no variations allowed.

  • ✓ robots.txt
  • ✗ Robots.txt, ROBOTS.TXT, robot.txt

Blocking CSS/JS Mistake

This is one of the most damaging mistakes you can make. Blocking CSS and JavaScript prevents Google from properly rendering your pages.

Why It's Harmful

Google uses your CSS and JavaScript to render and evaluate your pages. Blocking these files prevents Google from:

  • • Seeing your page as users see it
  • • Evaluating mobile-friendliness correctly
  • • Detecting hidden or cloaked content
  • • Understanding page layout and structure
  • • Measuring Core Web Vitals accurately

❌ The Wrong Way:

User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: *.css$
Disallow: *.js$

This was common advice years ago, but it's now harmful to SEO.

✅ The Right Way:

User-agent: *
Allow: /css/
Allow: /js/
Allow: /*.css$
Allow: /*.js$

Or simply don't mention CSS/JS at all (they'll be crawled by default).

Google's Official Stance:

"Don't use robots.txt to block CSS, JavaScript, or images."

Sitemap Inclusion

Including your sitemap in robots.txt is a best practice that helps search engines discover your content faster.

Proper Syntax:

Sitemap: https://www.example.com/sitemap.xml

✅ Best Practices:

  • 1. Use Absolute URLs
  • ✓ Sitemap: https://www.example.com/sitemap.xml
  • 2. Include Protocol
  • ✓ https://www.example.com/sitemap.xml
  • 3. Add All Sitemaps
  • Sitemap: https://example.com/sitemap.xml
    Sitemap: https://example.com/sitemap-images.xml

❌ Common Mistakes:

  • 1. Relative URLs
  • ✗ Sitemap: /sitemap.xml
  • 2. Missing Protocol
  • ✗ Sitemap: www.example.com/sitemap.xml
  • 3. Wrong Case
  • ✗ sitemap: https://example.com/sitemap.xml

Case Sensitivity

Understanding case sensitivity rules prevents confusing errors in your robots.txt file.

Directives Are Case-Insensitive

These are all equivalent:

User-agent: *
user-agent: *
USER-AGENT: *

However: Use standard capitalization (User-agent, Disallow, Allow, Sitemap) for readability.

URLs Are Case-Sensitive

These are DIFFERENT:

Disallow: /Admin/
Disallow: /admin/

Match your URL structure exactly. If your site uses lowercase URLs, use lowercase in robots.txt.

Real-World Examples

Learn from these real-world robots.txt examples optimized for different types of websites.

Example 1: Blog/Content Site

# Blog robots.txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# Block search results
Disallow: /?s=
Disallow: /search/

# Block author archives if thin content
Disallow: /author/

# Allow important resources
Allow: /wp-content/uploads/

Sitemap: https://www.blog.com/sitemap.xml

Example 2: E-commerce Store

# E-commerce robots.txt
User-agent: *

# Block customer account areas
Disallow: /account/
Disallow: /checkout/
Disallow: /cart/
Disallow: /wishlist/

# Block duplicate content from filters
Disallow: /*?orderby=
Disallow: /*?filter_
Disallow: /*?currency=

# Block tracking parameters
Disallow: /*?utm_
Disallow: /*?ref=

# Allow product images
Allow: /wp-content/uploads/

Sitemap: https://www.store.com/product-sitemap.xml
Sitemap: https://www.store.com/category-sitemap.xml

Example 3: Corporate Website

# Corporate robots.txt
User-agent: *

# Block internal tools
Disallow: /intranet/
Disallow: /employee/
Disallow: /internal/

# Block unnecessary pages
Disallow: /thankyou/
Disallow: /confirmation/

# Block test and staging
Disallow: /test/
Disallow: /staging/

Sitemap: https://www.company.com/sitemap.xml

Implementation Checklist

Use this comprehensive checklist before deploying your robots.txt file to avoid SEO disasters.

Pre-Deployment Checklist

  • File named exactly "robots.txt" (lowercase)
  • Located in root directory
  • No accidental blocking of important pages
  • CSS and JavaScript are allowed
  • Sitemap URL included (with https://)
  • No syntax errors or typos
  • File tested in Google Search Console
  • Tested with important URLs

Post-Deployment Checklist

  • File accessible at yourdomain.com/robots.txt
  • Returns 200 status code (not 404)
  • Content displays correctly
  • Submitted to Google Search Console
  • No crawl errors after 48 hours
  • Important pages still indexed
  • Calendar reminder set for quarterly review

Ready to Create a Perfect robots.txt?

Use our free robots.txt Generator with built-in validation and best practices

Buy Me a Coffee

If you find these tools helpful, consider supporting the project! Your support helps us maintain and improve our free tools for everyone.

Support Us