Should You Disallow Old Link Structures With Robots.TXT?
Questions from Readers…
We’re getting great questions from readers of our book, WordPress 3.0 Search Engine Optimization. Today, Michael tackles a question sent in by Jeff of Houston, TX. Remember, send in those questions and feedback! We’re always thrilled to help out our readers.
Hi Mr. David,
I’m sorry to contact you with such an insignificant matter, but I just got your book today and wanted to ask if you could clarify an issue that I have encountered. My site has been up for about 6 months and I had been using a permalink structure of /year/month/day/postname and I changed it to /category/postname. I also used Deans Permalink Migration plugin to add 301 redirects for published posts.
I want to use your Ultimate Robot.txt file to my site, but I’m wondering if I add the “Disallow: /2011/ ” directive to eliminate duplicate content in my archives, will it disallow my previous posts that had /2011/ in the old permalink structure? Any help or clarification on this issue would be very appreciated. Thank you for your time.
Jeff
Houston, TX
Jeff,
We love hearing from readers.
Yes, I believe that if you add the directive Disallow: /2011/ you will remove year archives from indexing, but also any post that uses the year in that position as part of its permalink structure. I tested it, and it appears to disallow the content.
You can test your robots.txt file by using Google Webmasters’ Crawler Access testing tool. The tool lets you test the text of a robots.txt file and compare it to a specific URL. The tool then tells you if your robots.txt file is allowing or blocking the URL. You can find the tool by logging into Google.com/webmasters and then selecting “Site Configuration” and then “Crawler Access” from the left menu. We didn’t cover this specific tip in WordPress 3.0 Search Engine Optimization, but we will implement it in a future edition of the book.
Now, but you say you’ve changed your permalink structure–that should solve the problem. In the case where a robots.txt entry would block regular blog posts from getting indexed when blocking year archives, the solution is clear: don’t block either. Just make sure your year archive is set to display excerpts of the posts, rather than the full text of the posts.
Michael