Tag Archive for: WordPress SEO

How to Fix WordPress Robots.txt after Googlebot Cannot Access JS and CSS Warning

We’ve been doing SEO for WordPress for a long time. A big part of that has always been controlling the amount, and quality, of indexed pages, since WordPress creates so many different flavors of content automatically. If you’ve read Michael David’s book on WordPress SEO, you’ve seen his ultimate robots.txt file
https://tastyplacement.com/book-excerpt-the-ultimate-wordpress-robots-txt-file which goes something like this:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: /tag/
Disallow: */trackback
Disallow: */comments

Unfortunately, we’re in a post-Mobilegeddon world. Google is expecting free access to render every page in its entirety so it can infer the sort of experience a user would have on various mobile devices. A few weeks ago, a significant portion of the WordPress installations in the world received the Google Search Console warning:

Googlebot cannot access CSS and JS files

Some of you may be wondering why we can’t just remove all Robots.txt disallow rules and let Googlebot decide what it thinks is important, and stop being fussy about what’s allowed and disallowed. For security reasons, you don’t want to have deep indexing of your site publicly searchable. For instance, the following search term gives you a list of thousands of WordPress installations which have the highly hackable timthumb.php:

intitle:index timthumb.php

Just something to think about when you assume that Google has your site’s best interests at heart.

It’s possible that you can go through each resource, and allow the precise file paths line by line. But that’s going to be very time consuming.

The solution which has been going around (advocated by the likes of SEOroundtable and Peter Mahoney is to add an additional few lines which explicitely allow Google’s spiders access to the resources in question:

user-agent: googlebot
Allow: .js
Allow: .css
#THE ABOVE CODE IS WRONG!

Yes, this unblocks the javascript and CSS resources, you can see it working in the Search Console fetch and render tool. Unfortunately, this also allows Googlebot access to the entire site.

If you haven’t read the Google developers page on Robots.txt, I highly recommend doing so. It’s like 50 Shades of Grey for nerds. The section under “Order of Precedence for User-Agents” states “Only one group of group-member records is valid for a particular crawler . . . the most specific user-agent that still matches. All other groups of records are ignored by the crawler.” By creating a new group for Googlebot, you are effectively erasing all prior disallow commands.
Google search console allowed robots tester
You can try putting the allow directives within the main group-member, but that won’t work either, because of the order of precedence of group-member records. The longest (most specific) rule is going to win, so the following rules would leave the javascript resources blocked:
user-agent: googlebot
disallow: /wp-content/
allow: .js

google search console robots tester blocked

And wildcard conflicts are undefined. So it’s a tossup result for:
user-agent: googlebot
disallow: /wp-content/themes/
allow: /wp-content/themes/*.js

The long and the short of it is there is no simple cut-and-paste solution to this issue. We’re approaching it on a case by case basis, doing what’s necessary for each WordPress installation.

As far as keeping the indexes clean, we’re going to lean heavily on the robots metatags, as managed by our (still) favorite SEO plugin. Expect the role of robots.txt to be greatly reduced going forward.

From the Wordpress SEO book

Should You Disallow Old Link Structures With Robots.TXT?

Questions from Readers…

We’re getting great questions from readers of our book, WordPress 3.0 Search Engine Optimization. Today, Michael tackles a question sent in by Jeff of Houston, TX. Remember, send in those questions and feedback! We’re always  thrilled to help out our readers.

Hi Mr. David,

I’m sorry to contact you with such an insignificant matter, but I just got your book today and wanted to ask if you could clarify an issue that I have encountered. My site has been up for about 6 months and I had been using a permalink structure of /year/month/day/postname and I changed it to /category/postname. I also used Deans Permalink Migration plugin to add 301 redirects for published posts.

I want to use your Ultimate Robot.txt file to my site, but I’m wondering if I add the “Disallow: /2011/ ” directive to eliminate duplicate content in my archives, will it disallow my previous posts that had /2011/ in the old permalink structure? Any help or clarification on this issue would be very appreciated. Thank you for your time.

Jeff

Houston, TX

Jeff,

We love hearing from readers.

Yes, I believe that if you add the directive Disallow: /2011/ you will remove year archives from indexing, but also any post that uses the year in that position as part of its permalink structure. I tested it, and it appears to disallow the content.

You can test your robots.txt file by using Google Webmasters’ Crawler Access testing tool. The tool lets you test the text of a robots.txt file and compare it to a specific URL. The tool then tells you if your robots.txt file is allowing or blocking the URL. You can find the tool by logging into Google.com/webmasters and then selecting “Site Configuration” and then “Crawler Access” from the left menu. We didn’t cover this specific tip in WordPress 3.0 Search Engine Optimization, but we will implement it in a future edition of the book.

Now, but you say you’ve changed your permalink structure–that should solve the problem. In the case where a robots.txt entry would block regular blog posts from getting indexed when blocking year archives, the solution is clear: don’t block either. Just make sure your year archive is set to display excerpts of the posts, rather than the full text of the posts.

Michael

Buy the Book Today at Amazon

Are Site-wide H1 Tags in WordPress Good or Bad?

Questions from Readers

The great thing about writing our book, WordPress 3.0 Search Engine Optimization, is we get to hear from all those readers who have taken our material and put it to work in the field. Today, we’ve got a fascinating question from Robert, who asks that question we confront every day in one way or another: Just how far should I trust Google’s sophistication?

Hi Michael,

I’m currently reading your Packt book on WordPress SEO, and I have a quick question about HTML5 and the way it uses header tags. Your book says to use only one H1 tag per page, which makes sense. However, HTML5 advocates multiple H1 tags per page, as long as each is contained in a separate section/header.

Worse yet, the first H1 tag on a page is usually a wrapper around the home link logo and contains the same meaningless title text on every page. You can see a typical example at CSS3maker.com :

<header>

<h1 id=”logo”><a href=”index.html” title=”CSS 3.0 Maker”>Css 3.0 Maker</a></h1>

</header>

Most SEO bloggers assume single H1 tags are a thing of the past. Based on your experience, has there been any evidence that Google/Yahoo interpret HTML5 content any differently than HTML/XHTML?

If not, should I remove the header and h1 tags around my logo anchor tag? My site looks like the CSS3maker code above. And like them, I don’t have anything else in my header, so if I remove the H1 tag, wouldn’t I also just scrap the header tag? I have a meaningful H2 tag in my content section, which could be elevated to an H1 tag.

Thanks,
Robert

BTW, I’m really enjoying your book.

 

Robert,

This may be a cop out…but does this help?

I think google is tuned in enough to ignore site-wide h1 tags. One of my philosophies is “packaging”–make it so brain-dead easy for a search engine that it can’t POSSIBLY get confused. We are sort of on-page nerds when it comes to that stuff. Most of the pages we create are pretty perfect, at least on the page.

Do we, in our SEO business, remove site-wide h1 tags around logos and site names in the header? Absolutely we do, but I don’t think it’s the kiss of death if you don’t. Remember one thing: google has to fit its algorithm so that it doesn’t punish sites for small mistakes–otherwise, it would punish 80% of the web or more.

I am very glad you are enjoying the book!

Michael

Buy the Book Today at Amazon

From the Wordpress SEO book

SEO Master Class: Choosing a Keyword-Rich Domain Name

The following is an excerpt (with some recent modifications and editorial comments) from our book, WordPress Search Engine Optimization. You can buy the book at Amazon.

SEO Master Class: Choosing a Keyword-Rich Domain Name

Almost all websites will rely on primary keywords on core pages like the front page. If your keyword research teaches you that one phrase or a very small group of related phrases represents your high-volume, high-relevance primary keywords, then you’ll want to consider using those keyphrases in a keyword-rich domain name. For some, this won’t be possible or desirable: perhaps the domain name has already been chosen, or the business’ marketing strategy revolves principally around a customized brand name. But if you have the opportunity to choose a keyword-rich domain name, you’ll benefit from a little extra power in your ranking efforts down the road. You may have noticed that often a competitive search market is populated with websites that have keywords in their domain name. This is no accident: key terms in the domain name is a ranking factor and experienced webmasters know it.

Whatever you do, choose wisely; if you ever need to change your domain name, it’ll take a lot of work and you’ll loose both incoming links and existing customers.

Tip:

SEO professionals know that you don’t always have—and won’t always need—every SEO element (domain age, keyword-rich domain name, expert title tags, thousands of inbound links, etc.) to rank well. When you consider all the elements together that make a site rank well, you want to make sure you have 80% of the elements present—but don’t fret if a few elements are out of your control.

Domain names are certainly an element that search engines consider as a ranking factor. Remember a search engine’s core purpose: to deliver relevant search results to a user entering a query. Certainly a domain name that includes a few of the searcher’s query terms would tend to be relevant for that query. The weight afforded by search engines to keywords in the domain names is moderate. In competitive markets, a keyword-rich domain name can provide some extra push to pass tough competitors. This can be frustrating in a market where every conceivable variant of a domain name has been snatched up.

Also keep in mind that keyword prominence applies to keywords in domain names. This means that the first words in a domain name are afforded greater weight by the search engines than the last words in a domain name. You will also want to mirror the word order of popular search phrases whenever possible and keep your important terms first in the domain name.

To craft a domain name, begin with your primary keywords. We’ll use some real keyword data and search volume surrounding the keyphrase “Denver homes” as an example.

Keyword Monthly Search Volume
Denver homes for sale 1000
Denver homes 1000
Denver homes for rent 280
new homes Denver 280

The preceding table demonstrates a few important points:

  • “Denver” is the first word in both of the highest volume key phrases.
  • “Denver” appears in all four of the keyword variations.
  • “Homes” appears in all four of the keyword variations.

In this example, the terms “new” and “for rent” aren’t the valuable terms—unless of course your website is concerned with rental homes and apartments in Denver, in which case the “Denver homes for rent” keyphrase is the only relevant one on which to base your domain name. With “Denver” in the first position for the majority of searches, you will want to maintain that word order.

You should also consider keyword overlap in crafting domain names. Keyword overlap exists when one key phrase or keyword is incorporated either partially or fully within another—and you can use it to your benefit. In our example, “Denver homes” has full overlap with “Denver homes for sale.” When you see overlap like that with robust search volume for both phrases, the longer key phrase becomes even more attractive as a primary keyword for your domain name. “New homes Denver” has only a partial overlap, and even that’s a stretch because the word order is reversed.

And so, in our example, the path is clear: “Denver homes for sale” is a highly desirable high-volume phrase to use as the basis for a domain name. But what to do if “denverhomesforsale.com” is already taken? You have two options: buy an existing or dropped domain, play with hyphens, or create a clever variation with extra words.

Buying/Acquiring Domain Names

You can always buy a domain name from its owner or wait for an existing domain to expire (so-called “dropped” domains). For dropped domains, there are a host of online services that, for a fee, will help you navigate the increasingly complex world of expired domains. This approach will yield some some inevitable frustrations: the system is dominated by experts that have mastered its subtleties. As a newcomer, you’ll likely have to endure a learning curve. Also, an owner of an expired domain is entitled to a redemption period during which you’ll have to wait if you want to snatch up a choice domain. For most SEO pros, the extra time and risk isn’t worth it—especially when you can overcome a less-than-perfect domain name with sound on-page optimization and some extra linking power.

You can also buy a domain in the aftermarket from an existing domain owner. Dangers to watch our for with this approach are that some domain owners make it impossible to be found, and when you do find them, they have a completely deluded sense of the domain’s value. Services like sedo.com and domainbrokers.com maintain ostensibly active listings of domains for sale. Domain registrars like godaddy.com offer domain “buying services” where you select a desired domain name and they attempt to secure it for you.

In the domain resale market, asking prices for domains are typically astronomical. Overall, the domain resale market is riddled with complexities, dead ends, and punitive pricing. If you do undertake to purchase a domain, either by resale or following expiration, be prepared for a hunt. Smart SEO professionals don’t overpay for domains, and they certainly don’t endure unreasonable delays to launch their next project.

Hyphens and Extra Characters in Domain Names

It’s true: all the easy domain names are taken. But you still have an opportunity to fashion a keyword-rich domain name with a little creativity. All domain names must follow these technical rules:

  • Domains can include letters (x, y, z).
  • Domains can include numbers (1, 2, 3).
  • Domains can include dashes/hyphens, and can be repeated in sequence (-, –, —).
  • Domains cannot include spaces.
  • Capital letters are ignored.
  • Domains can’t begin or end with a dash.

Hyphens present a good opportunity. In our example, we might consider checking for the availability of denver-homes-for-sale.com. This domain keeps the keywords in order, maintains keyword prominence, and the hyphens have two benefits: they certainly make the domain easier for humans to read and can help search engines distinguish the words (i.e., “kitchens pot,” vs. “kitchen spot”). The drawback of hyphens—and it is worth consideration—is that hyphenated domains are awkward and unmemorable and can appear trashy. Visitors are unlikely to remember your specific combination of words and hyphens. It can also be inconvenient to express your email address repeatedly as “Peter at Denver homes for sale , dot com, with hyphens between all four words.” That said, in a pure search environment, where you are going solely for keyword-based traffic, you can worry less about memorability. You’ll be getting your visitors solely from search and not requiring repeat visitors.

Hyphenated domains have a fairly-deserved reputation as being a bit trashy; many link farms and thin content sites employ hyphens in their domain names.

A helpful variant of this technique is to simply apply a suffix to the domain, such as denverhomesforsalenow.com or denverhomesforsale303.com (303 is an area code in Denver). Get creative: think of a term that adds to your domain. The terms “express” and “pros” have positive connotations. “Express” suggests speedy, high-value service. “Pros” suggests someone licensed with experience. Find an appropriate suffix for your domain and you will have a keyword-rich domain without the hassle and expense of purchasing in the domain aftermarket.

As a final word on domains, make sure you use a reputable domain registrar. Some disreputable registrars may make it difficult for you to transfer you domain away later.

Tip:

Don’t park your domains, put up content! Domain registrars like GoDaddy offer domain parking “service.” This isn’t a service at all—it’s a way for GoDaddy to squeeze a few pennies in pay-per-click ads out of your domain. The better approach is to put up even just a few paragraphs on your domain just to get the search engines indexing the page and building up some site age. Parked domains don’t earn site age.

Buy the Book Today at Amazon