• Austin SEO
    • TastyPlacement in the Press
    • Meet the Team
  • Blog
  • Services
    • SEO Services
      • WordPress SEO Service
      • Magento SEO Services
      • Conversion Rate Optimization
      • Why Google Certification Matters
    • PPC & Adwords
      • Adwords & PPC Management
      • Remarketing Services
      • Display Ad Management
      • Facebook Ad Management
      • Pinterest Ad Management
      • Google Ad Grants Management for Non-Profits
      • Adwords App Install Ad Management
      • Product Listing Ad Management
    • Analytics & Data
      • Analytics and Monitoring
      • Google Tag Manager Experts
      • Data Studio Development & Consulting
    • Social Media & Local Marketing
      • Social Media Marketing
      • Local SEO
    • Web Development
      • Mobile Website Design
      • WordPress Development
  • Case Studies
    • SEO Case Studies
      • SEO Case Study: We Beat Lowes, Then We Beat Home Depot
      • SEO Case Study: Total Domination in Houston for Medical Provider
    • Analytics Case Studies
      • Case Study: Updated Design Yields 43% Increase in Conversion Rate
      • Case Study: PPC Optimization Yields Tripled Conversion Rate
    • Social Media Case Studies
      • Social Media Case Study: Hundreds of New Customers From Core Facebook Campaign
  • Portfolios
    • Display Ad Portfolio
    • Design Portfolio
    • Infographic Portfolio
    • SEO Testimonials
  • Contact
    • New Customers: Get to Know Us
    • Customer Service & Support
    • Referral Program
    • SEO Training Seminars
    • Job: Paid Search/PPC/Adwords Analyst
    • Job: Local Digital Marketing Specialist
    • Job: SEO/Marketing Junior Analyst
    • Privacy Policy & Terms of Use
  • Menu Menu
Pubcon Austin 2013

Tutorial: Block Bad Bots with .htaccess

November 6, 2012/4 Comments/in SEO/by Michael David

In this tutorial, we’ll learn how to block bad bots and spiders from your website. This is a standard safety measure we implement with our WordPress SEO service. We can save bandwidth and performance for customers, increase security, and prevent scrapers from putting duplicate content around the web.

Quick Start Instructions/Roadmap

For those looking to get started right away (without a lot of chit-chat), here are the steps to blocking bad bots with .htaccess:

  • FTP to your website and find your .htaccess file in your root directory
  • Create a page in your root directory called 403.html, the content of the page doesn’t matter, our is a text file with just the characters “403”
  • Browse to this page on AskApache that has a sample .htaccess snippet complete with bad bots already coded in
  • You can add any bots to the sample .htaccess file as long as you follow the .htaccess syntax rules
  • Test your .htaccess file with a bot spoofing site like wannabrowser.com

Check Your Server Logs for Bad Bots

Bad Bots Server Log

If you read your website server logs, you’ll see that bots and crawlers regularly visit your site–these visits can ultimately amount to hundreds of visits a day and plenty of bandwidth. The server log pasted above is from TastyPlacement, and the bot identified in red is discoverybot. This bot was nice enough to identify its website for me, but DiscoveryEngine.com touts itself as the next great search engine, but presently offers nothing except stolen bandwidth. It’s not a bot I want visiting my site. If you check your server logs, you might see bad bots like sitesnagger, reaper, harvest, and others.  Make a note of any suspicious bots you see in your logs.

AskApache’s Bad Bot RewriteRules

AskApache maintains a very brief tutorial but a very comprehensive .htaccess code snippet here. What’ makes that page so great is that the .htaccess snippet already has dozens of bad bots blocked (like reaper, blackwidow, sitesnagger) and you can simply add any new bots you identify.

If we want to block a bot not covered by AskApache’s default text, we just add a line to the “RewriteCond” section, separating each bot with a “|” pipe character. We’ve put “discoverybot” in our file because that’s a visitor we know we don’t want :

# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(verybadbot|discoverybot) [NC,OR]

If you are on the WordPress platform be careful not to disrupt existing entries in your .htaccess file. As always, keep a backup of your .htaccess file, it’s quite easy to break your site with one coding error. Also, it’s probably better to put these rewrite rules at the beginning of your .htaccess file so no pages are served before the bots read the rewrite directives. Here’s a simplified version of the complete .htaccess file:

ErrorDocument 403 /403.html

RewriteEngine On
RewriteBase /

# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|discoverybot) [NC,OR]

# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]

Here’s a translation of the .htaccess file above:

  • ErrorDocument sets a webpage titled 403.html to serve as our error document when bad bots are encountered; you want to create a page in your root directory called 403.html, the content of the page doesn’t matter, our is a text file with just the characters “403”
  • RewriteEngine and RewriteBase simple mean “ready to enforce rewrite rules, and set the base URL to the website root”
  • RewriteCond directs the server “if you encounter any of these bot names, enforce the RewriteRule that follows”
  • RewriteRule directs all bad bots identified in the text to our ErrorDocument, 403.html

 Testing Our .htaccess File

Once you upload your .htaccess file, you can test it by browsing to your site and pretending to be a bad bot. You do this by going to wannabrowser.com and spoofing a User Agent, in this case, we spoofed “SiteSnagger”:

If you installed properly, you should be directed to your 403 page, and you have successfully blocked most bad bots.

Some Limitations

Now, why don’t we do this with Robots.txt and simply tell bots not to index? Simple: because bots might simply ignore our directive, or they’ll crawl anyway and just not index the content–that’s not a fix. Even with this .htaccess fix, it’ll only block bots that identify themselves. If a bot is spoofing itself as a legitimate User Agent, then this technique won’t work. We’ll post a tutorial soon about how to block traffic based on IP address. But, that said, you’ll block 90% of bad bot traffic with this technique.

Enjoy!

Share this entry
  • Share on Facebook
  • Share on Twitter
4 replies
  1. stuart
    stuart says:
    February 20, 2014 at 1:51 am

    Hi Michael can you block backlinking bots like Majestic, Ahrefs, Moz etc… from crawling your site by using the .htaccess?

    How can you test the .htaccess to see if this worls correctly?

    Reply
    • Michael David
      Michael David says:
      February 21, 2014 at 10:34 pm

      I was just looking into this tonite. I want to block “builtwith.com”, but can’t see it in my server logs. I added the text “builtwith” to my htaccess file just in case. Majestic shows up in my server logs and I do believe i can block it. To test it, got to WannaBrowser.com and spoof the user agent. Just enter the name of the user agent and that will spoof it for you.

      Reply
  2. Bill Minozzi
    Bill Minozzi says:
    February 1, 2020 at 6:52 am

    Hi,
    We developed this free PHP App to block bots:
    http://stopbadbots.com/
    Cheers,
    Bill
    Developer

    Reply
  3. Prakash Gohel
    Prakash Gohel says:
    September 16, 2020 at 11:30 pm

    Which is the Right way to block bad bots, right now I am usiythe the above method, some blogger are recommending the Cloudflare firewall rule & some are using robots.txt
    Also should we block baidu & archive.org because it’s crawling my site a week.

    Thank You.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tutorials & Case Studies

  • Analytics
  • Backlink Strategies
  • Case Studies
  • Infographics
  • Internet Marketing
  • Local Maps and Local Listings
  • Magento
  • Mobile SEO
  • Our Book: SEO for Wordpress
  • PPC
  • Programming & PHP
  • SEO
  • SEO Power Tools
  • SEO Resources
  • Social Media Marketing
  • Updates & News
  • Web Design
  • WordPress

Our Most Recent Tutorials & Case Studies

  • Test Results: How to Stop Google Re-Writing Your Title Tags in the SERPs
  • Pro Tip: Track Your Website Goals in Google Analytics (and More)
  • How to Connect Google DataStudio to MySQL Database (cPanel Flavor)
  • UTM Codes/Tags: A Quick Guide to Tagging Ads Like a Pro
  • Social Media Case Study: Hundreds of New Customers From Core Facebook Campaign

Search

Archives & Tutorial Categories

  • Analytics
  • Backlink Strategies
  • Case Studies
  • Infographics
  • Internet Marketing
  • Local Maps and Local Listings
  • Magento
  • Mobile SEO
  • Our Book: SEO for Wordpress
  • PPC
  • Programming & PHP
  • SEO
  • SEO Power Tools
  • SEO Resources
  • Social Media Marketing
  • Updates & News
  • Web Design
  • WordPress

Austin SEO Company, TastyPlacement

TastyPlacement
4150 Freidrich Ln Ste C
Austin, TX 78744
Tel: (512) 535-2492

Google Maps: Get Directions or Read Our Awesome Reviews

Quick Navigation: Our Most Important Pages

  • Austin SEO [Home]
  • WordPress SEO Service
  • PPC Management
  • Social Media Marketing
  • Analytics and Monitoring
  • Remarketing Experts
  • Conversion Rate Optimization
© Copyright - TastyPlacement. Made in Austin, Texas
  • Twitter
  • Facebook
Configure Squid Proxy for Multiple Outgoing IP Addresses Monitor Size Statistics for Web Design & HTML
Scroll to top