How to Run Screaming Frog in Google Compute Engine

How to Run Screaming…

…Frog in Google Compute Engine

There have been a few blog posts about using Screaming Frog to crawl extremely large sites. By controlling the depth of the crawl and your hardware resources, you can do quite a lot. For a year we crawled extremely large sites on a dedicated laptop with its RAM slots expanded to the limit, and we successfully did millions of pages this way. The most recent release of Screaming Frog has a database option which saves its crawl data on your local hard drive, effectively removing size limitations.

Except there are a lot of limitations. We could never get past two million pages crawled, due to a number of considerations, particularly time constraints. A million pages took about a week to crawl using the database storage method. Saving a crawl project took all day. Producing a 404 inlinks report would take hours. And after a lightning storm crashed the computer and destroyed a week of data, we figured there had to be another way.

Clearly, our biggest limitation was the hardware. As an advertising agency, we don’t have a lot of computational resources onsite. Website hosting doesn’t happen in the office, and most of what we do with our computers is link up with much better computers elsewhere.

If you think about it, there’s one company we know for sure has the hardware capacity to crawl scads and scads of pages. And as luck would have it, they also rent that hardware out by the minute.

Setting Up Google Compute Engine

There’s a lot of fun stuff in the Google Cloud Console, like Bigquery and APIs for AI services, but what we want is about halfway down the list, Compute Engine. I’m going to assume that you’ve already setup billing and IAM permissions in a way that makes sense for your organization. It may take you a little while to figure out the byzantine levels and classifications of permissions.

WARNING: MAKE SURE YOU HAVE BUDGET LIMITS AND WARNINGS IN PLACE. Because if you’re not careful, you can rack up thousands of dollars of expenses in here, but if you are careful, it’ll only be pocket change.

Compute Engine lets you create a virtual machine of any of the popular flavors of Linux (or a virtual Windows Server, if you’re depraved enough to use Windows), and you can scale the power to your needs and budget. The default VM instance has about the same power as a cheap laptop, and on the upper end they’re virtually supercomputers. Consequently the cost will be anywhere from $24 a month to $1649 a month.
Google Engine VM instances Menu
So after you click on Compute Engine, click on create instance.
Google Compute Engine VM Createinstance
Give the instance a funky name like “where-the-frog-lives” or “scream-town.” If this is the only VM you put together the naming scheme won’t matter, and if it’s the start of a profitable relationship with Google Cloud, it will ensure that future IT managers will hate your guts.

The “machine type” we’ve found that has the best balance between power and cost is the n1-highmem-8. It has enough memory to crawl essentially any site you can think of (52GB), and it’ll only cost about eleven bucks a day (if you remember to turn it off when you’re done with it). Don’t worry, you can scale up or down as necessary pretty much on the fly after you know what your needs are. Again, make sure you set budget limits and alerts in your account. In general, you can avoid most charges when you turn off your VM instance. Goodness help you if you forget about it and leave it running.

For “Boot Disk” change it to Ubuntu 14.04 LTS. This will give you an operating system much like the one you installed on that five-year-old laptop to squeeze a little more life out of it, except in this case the laptop will have eight cores and ten times the RAM.

You will probably want to set a static IP at this point, since you will need to have your IP whitelisted by the targeted server or your super-fast crawl will get you blocked pretty quick. Otherwise, every time you stop and turn on your VM, you will get a new external IP (which can also be useful if we’re honest).

Since you will want to stop this instance and come back to it from time to time, you will need a persistent disk. Start with 100GB, but you may need to upgrade that as necessary. A little farther down we’ll talk about pushing large datafiles to Google Storage and from there to Bigquery. This will be the only way to deal with giant Screaming Frog reports.

A less polite solution to the whitelist problem is to have Screaming Frog use a proxy which you can set at Configuration » » System » » Proxy. Alternately you can stop your VM instance every time you get blacklisted, and once you turn it back on you’ll have a new IP, provided you didn’t set a static one earlier.

Installing Screaming Frog in Compute Engine

It’s possible to run Screaming Frog from the command line, but it’s hard to find documentation on that. A much easier and robust solution: Give your Compute Engine instance a virtual desktop and use Screaming Frog like you would normally.

This article by Aditya Choudhary covers the step by step process for installing a GUI interface for your Compute Engine instance.

I won’t repeat these instructions (except for the firewall settings which are out of date), but the general process is to install Gnome components and the VNC virtual desktop server. Then you need to open the ports which allow your VNC client to connect. I’ve found that TightVNC works pretty good as a VNC client.

The firewall for Compute Engine is a little tricky. By default everything is blocked. To get to the firewall settings, start from the cloud console list of virtual machine instances, then click on name of the VM you’re using, which takes you to the “details” page. Then click on the “default” link from the Network Interfaces section.

Then click on “firewall rules” on the left hand navigation. Then click “create firewall rule” and make the rule which will open up the VNC port for your personal IP. If they tried, they might be able to make that a little harder to find.

At this point you should have a TightVNC instance open, looking at a blank Ubuntu desktop. You should also have an SSH connection open, or if you’re a wiseguy, you can open a terminal window through VNC.

Compute Engine instances don’t have root passwords by default, so you need to create one with the command:
sudo passwd

In your fancy GUI interface, open up Firefox (pre-loaded probably) and go to the Screaming Frog website. Download the Ubuntu installation file which should save as a *.deb. It’s easiest to install this from the command line with:
sudo dpkg -i *.deb sudo apt-get install -f

After that installs, Screaming Frog will appear under the “internet” tab of the start menu. Presumably you already have a pro license key to enter into Screaming Frog, otherwise you will hit your crawl limit pretty quick.

Screaming Frog Deep Crawl Settings

Most of the configuration settings for Screaming Frog are features which limit its power and capability. But limits are for plebes. You now have world-class server power at your fingertips. Your settings will unleash the full power of your spider. Crawl beyond limits my pretties!

Configuration » » Spider » » Basic: go ahead and check everything. Let’s be ambitious!
» » Limits: uncheck everything

Configuration » » Speed » » Max Threads: I’ve done 40 at once, which translates to around 3000 pages a minute. This will get you blocked by nearly everyone. Start with 10 at once and see how far that gets you.

Configuration » » System » » Memory: Subtract 2GB from what’s available on your machine type, in this case enter 50GB.

Configuration » » System » » Storage: Mode = Memory Storage, unless you don’t have much RAM, then go with database mode, which will run much slower, but if you got the machine type I recommended, you should have more than enough memory space.

Configuration » » System » » Proxy: use a disposable proxy if you think it’s likely you’ll get blocked. Getting blocked throws the whole crawl for a loop, all the URLs crash and you don’t get the same depth, so it’s better to avoid it by getting the server admins to whitelist you ahead of time.

Starting and Stopping the Instance

I’ve mentioned it a few times already, but Google is charging you by the minute, so you need to shut down the instance when you’re not actually crawling. The persistent disk and the static IP will also cost you money, but much less. If you click the stop button on the Cloud Dashboard, the VM will shut down just as if you gave the shutdown command from terminal. The persistent disk will keep everything as you left it until you turn it back on. Just like a normal Ubuntu machine though, any apps you had open will need to be restarted. So be sure to save your Screaming Frog crawls before shutting down (that’s why we have the large persistent disk).

When you restart the VM, you will need to restart the virtual desktop with the SSH command:
Vncserver

When you restart Screaming Frog, you can also reload any crawls you haven’t finished. Loading a giant crawl takes forever on a normal computer, but it will only be a few minutes at most on Compute Engine.

Integrating with Cloud Storage and Bigquery

So you’ve crawled a mammoth enterprise-sized site, and now what are you going to do with all those reports? That’s where the other amazing Google Cloud products come in. You’re going to export the data in CSV format to Google Cloud Storage. From there you’re going to import into Bigquery, because there’s not a spreadsheet in the world that will open that much data.

Because Google has already thought of this, there’s an SDK that integrates neatly with other Cloud products. You initialize with the SSH command:
gcloud init

Follow the prompts to authorize your Compute Engine using the web browser which you’re already logged in through. Now your storage buckets are only a few clicks away. Export your hard-won spider file to the bucket with a command like:
Gsutil cp /home/*.spider gs://screaming-frog-bucket
It’s pretty much like having an attached drive to your virtual machine which is arbitrarily large and ridiculously cheap.

You will also want to export your various reports, like your 404 inlinks report or the insecure content report to the bucket. Sure, you can work with that data through Screaming Frog in your little graphic interface window, but if you have any SQL experience, Bigquery will be much easier. Once you copy over your CSV reports to your storage bucket, it’s just a matter of importing into BigQuery

If you’re going to leave the Compute Instance idle for a few weeks, you will want to step down the persistent disc size to something like 10GB, to cut back on costs. Simply transfer any large crawl files to Cloud Storage and then hit the stop button. You’ll want to check in on billing from time to time to make sure you’re not getting charged for something you don’t need. In the meantime, your absurdly powerful site crawl platform is there ready and waiting.

Be careful with this tool, don’t overload and break anyone’s site, and let us know how it works out in the comments!

April 24, 2018/by Matthew Bey

How to Make a Sitemap w/ Screaming Frog SEO Spider (+Video)

Magento, SEO Power Tools

Introduction (Video Below)

The following is a transcription of the video that appears below

Hi. This is Michael David from Tasty Placement. I’m the author of WordPress Search Engine Optimization, now out on 2nd Edition on Packt Publishing. My agency is a Google Marketing Platform (analytics) certified agency and Google Certified Adwords Partners.

What You’ll Need

For today’s tutorial, we are going to use Screaming Frog SEO Spider to create a full sitemap of a website. First, what you’ll need. We’re using Screaming Frog version 5.1 which is current in October 2015 and a website. Let’s go. The video quality blows because I’m using CamStudio, but you’ll be able to get through it.

Let’s do the Screaming Frog tutorial on how to create a sitemap

Why do we want a sitemap? Well, we want to create an XML sitemap and I’m not talking about an HTML sitemap page that you put on your website with a link to all of your pages. I don’t think users really use those. Your site should be big enough, if you want to make money, that a page like that is going to be too huge anyways. We’re talking about an XML sitemap, a sitemap.xml file, that we can submit to Google Webmasters or Bing Webmasters to help search engines crawl our site and find what’s important. By the way, don’t use the free sitemap tools online. We have a huge client, 600 employees, and one of the IT guys there used one of the free sitemap programs online and posted the sitemap. Upon later investigation, it didn’t include all the pages on the site. Screaming Frog is a very pro way to do this. You’re going to get a deep crawl, a full crawl, and a very good sitemap out of this that is completely compliant with search engines. So this is the way to do it.

Create the XML Sitemap: Step 1

First step. We’ve opened Screaming Frog and I’m using version 5.1, which is current in October of 2015. Always set the mode, we want to make sure we’re in Spider mode. We’re going to spider a single site. We are going to use tastyplacement.com which is our website. I’m going to start it and then it takes about 3 1/2 minutes to crawl so I’ll start it and then pause the video and then just resume when it finishes. So we’ve started it and you can see it’s already chewing and moving pretty quickly. I’ll be back in a sec.

After Screaming Frog Completes the Website Crawl

Okay, Screaming Frog just completed its crawl of tastyplacement.com and you can tell when it’s finished because this bar will progress eventually to 100%. This is a good time to sorta look through what you’ve got here. How do we do that? We can go over here and you can see we’ve crawled 798 elements and that includes JavaScript HTML, CSS files, images, PDFs. If you click on HTML here, you’re going to see only HTML files in this window and you’ll see a count here, 304 pages. That’s the number of pages we’ve got and that’s including some 301s and no index pages. You can also see you images here if you’re into that. But we’re really interested in creating a XML site of our HTML pages. This is a good point, once you do just a basic spider basic crawl in Screaming Frog, from this window you’re able to see if maybe one of your pages is lacking a title tag. You’ll see blanks here if there’s no title tag. If you scroll to the right you can see whether there’s a better description of what the lengths of those are. You can see we’ve got 155s, 124s, and 138s. So we’re really up to that 155 character standard there. This gets pretty deep though so we will do another video about that at another time.

This is a 2 phase process. First we crawl the site and the second step is we generate the sitemap. We’re 70% done. This crawled effectively, we’ve got the proper page count, we know Screaming Frog crawled it properly, so we can move on from here. Now if you’re at this point and you know you’ve got more than 300 pages and Screaming Frog didn’t find them all, it means you have a crawl problem. Bingo! Another benefit of Screaming Frog because if Screaming Frog can’t find the pages on your site because there aren’t links to all those pages, then a search engine spider may not find them either. That indicates a problem.

Generating the Sitemap

So from here, we’ve done our crawl and Screaming Frog makes this very easy. We’re going to go up to the title bar here to sitemaps and we’re going to create an XML sitemap. You could also create an image of sitemap, we’ll do another video. We’re going to click this and we’re going to get a little menu here. You progress through these tabs before hitting next. We do not want to include no index pages. We’ve no indexed them for a reason. If you know what you’re doing, don’t include no index pages here. These we all leave blank except PDFs. We’ve got about two PDFs we want Google to find. Last modified we leave alone. This is very sound, so if priority. These are good priority numbers, change frequency, it’s totally fine. And images, we do not want to include images in our main XML sitemap. We click next, it’s prenamed sitemap.xml. We’ll overwrite the file I created in the test, save, and that’s it. Let’s take a look at that file to review it. Go over to our temp directory and we’ll open this with notepad ++, which is a great text editor for all kinds of files. There it is and it looks good. We’ve got nice priorities here, higher priorities on the key pages up front. We are set.

Next step would be just to use our XML sitemap wherever we want it. We upload it to Google Webmasters, which is now called Search Console, or Bing Webmaster.

Just another one of the many things that Screaming Frog can do and at Tasty Placement we use this for a couple dozen individual tasks. This is definitely one of the easier ones because they’ve included the sitemap creation right the menu here. It’s 99 a year for Screaming Frog, but when you learn about all the things it can do it’s definitely high level stuff that pros use. Anyway, if you have any questions you can email me at michael@tastyplacement.com. As always, visit our site or follow us on Facebook. We’re always doing super, super high level tips on everything from WordPress to high level SEO and a lot of really advanced Google Analytics and Adwords stuff. Thanks for watching. I hope this helps you.

Create an XML Sitemap With Screaming Frog SEO Spider from michael david on Vimeo.

November 4, 2015/by Michael David

Getting the Elusive “A” for ‘Cache Static Content’ on Webpagetest.org

SEO Power Tools

Getting a high score on Webpagetest.org for “Cache Static Content” for a WordPress site can be tough. W3 Total Cache does a great job with some of the other metrics, but it won’t get you all the way there for “Cache Static Content”. Welcome Pubcon 2013 visitors!

Here’s our screenshot, looking good…

Here’s the Code:

You need to FTP to your hosting account, and make a manual entry to your .htaccess file (as always, be careful with that file and keep a backup).

<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/gif A31536000
ExpiresByType image/jpg A31536000
ExpiresByType image/jpeg A31536000
ExpiresByType image/png A31536000
ExpiresByType image/bmp A31536000
ExpiresByType text/css A31536000
ExpiresByType text/javascript A31536000
ExpiresByType application/javascript A31536000
ExpiresByType application/x-javascript A31536000
</IfModule>

So, why did we score only a “B”? Because we load google fonts from their repository/API, and we haven’t figured out how to cache that content.

January 10, 2014/by Michael David

Pubcon Roundup Austin 2013: 100 Top Tools, Tips, and Tricks

SEO Power Tools

We’ve assembled a non-exclusive list of 100 great tips and tricks from the event from all the Pubcon sessions we were able to attend. We aren’t going for depth here–we are expressing what we feel are the best ideas in a super-concise format. Each of the ideas here are pretty powerful and can easily warrant hours of additional research. If we haven’t attributed a tip to you and you’d like a mention/link–please email emilyhurn[at]tastyplacement.com.

Social Media Marketing Tips

Sponsored Facebook posts are rarely the best ad to employ on Facebook, you want to use the more advanced tool set. (Dennis Yu of BlitzMetrics)
If you make a claim in a Facebook ad, don’t say “We can help you save money on insurance.” Don’t even say “We can save you 15% on your insurance.” You need to be more specific — “Save 17.3% in just 3 minutes!”
You Get 60 Minutes of News Feed Fame: 40 to 80% of your traffic is generated in the News Feed. Keep your tidbit trending with shares & comments. (Dennis Yu)
Photo Fetish: 60% of consumption on Facebook is photo views. Start snappin’. (Dennis Yu)
Photo posts are 7x more engaging than status posts; comments have 4x the weight of ‘likes’. (Annelise Kaylor, Intrapromote)
Use Power Editor to run ads https://www.facebook.com/ads/manage/powereditor/. (Dennis Yu)

Set ads to display in the News Feed, NOT in the right sidebar. (Dennis Yu)
Ads must have social action. Use sponsored stories in combination with ads.
Number one thing to do on Facebook = sponsored stories (the story gets placed higher in the newsfeed or where ads are usually placed)
Amplify your message. Get your targeted audience to interact & push into conversions.
Every object in FB has an ID. Use graph.facebook.com/anybody to view user IDs. Grab the IDs and target individual users for ad campaigns. (Dennis Yu)
Where does the magic happen? In the News Feed! 65% of likes, 35% of comments, & 45% of likes on a mobile device are generated from the News Feed. (Annelise Kaylor, Intrapromote)
Once a fan opts out of seeing your posts, your posts will never organically appear to them again unless either 1. the fan decides to opt in again or 2. you have a promoted ad. (Annelise Kaylor, Intrapromote)
In edgerank, marking a post as ‘spam’ carries the highest negative weight. (Annelise Kaylor, Intrapromote)
The newsfeed placement of relevant stories is customized specifically for the user – engagement is key. (engagement > relevancy)
Make posts during weekends or outside business hours for higher response rates. (Annelise Kaylor, Intrapromote)
Fan count numbers mean nothing if there is no reaction/interaction involved. (Annelise Kaylor, Intrapromote)
“Make Insights your bitch!” Use Facebook Insights to check your organic results. (Annelise Kaylor, Intrapromote)
Social Media is an audience response channel. Interact with your clients. (Bill Rice, CEO, Kaleidico Digital Marketing)
Sometimes it’s just more cost effective to automate some of your social messages.
No one moves without influence – get your content into the stream of an influencer. (Bill Rice, CEO, Kaleidico Digital Marketing)
Fan exclusives boost your likes. (Annelise Kaylor, Intrapromote)
Strategic viral posts can boost your affinity and increase the reach of your marketing messages. (Joe Youngblood, Senior Account Executive, WrightIMC)
Unless you promote them, nobody will see your apps.
You’re not competing against your brand’s rivals, you’re competing against your fans’ friends and grandmothers for space in their news feed. (Bill Rice, CEO, Kaleidico Digital Marketing)
You can check the emotional impact of your headlines: http://www.aminstitute.com/headline/index.htm (Susan Young, CEO, Get In Front Communications, Inc.)
Listen to your audience – check out subreddit for idea generation. (Joe Youngblood, Senior Account Executive, WrightIMC)
An individual’s Facebook account is capped at 5,000 friends. By that measure, Dennis Yu has reached the limit of human popularity.
“Talking About You” is a bogus metric. (Dennis Yu)
Instead of typing out all those tedious letters in Facebook.com, like some animal, use fb.com instead.
Make your response time within 24hours. Half an hour is excellent. (Casey Markee)
Content which stimulates conversation extends your reach and is a step closer to conversion. (Bryan Cheney)
Fill out your content production by planning a social media calendar. (Bryan Cheney)
Customer referral programs are the affiliate marketing of the social media age.
Pay attention to personal behavior, you are being lead around by things curated by your friends. (Bill Rice, CEO, Kaleidico Digital Marketing)
Pinterest. Get on board! Why? Pinterest increased over 1,000% in 2012 & it drives referral traffic. (Vince Blackham, 97th Floor)
With Pinterest, “visual content is KING.” Make your Pinterest content relevant – even slightly. Find an idea that can go into multiple boards, push your content at specific times/days (6am/pm eastern time) & get on Community Boards to increase your reach into the millions. (Vince Blackham, 97th Floor)
Use http://tineye.com/ to check for attribution of images and posts on Pinterest.
Facebook is an image sharing network. 250 million photos are uploaded per day on Facebook & 70% of all activity is related to an image, photo, graphic, or video. (Kate Buck)
How to Repurpose Content Across Social Media (by Kate Buck): Create image, embed in blog post, pin from blog, tweet from pin, share on instagram with link to blog post, share to FB page (or upload directly), later…pin from instagram web, upload to Twitter directly (this works with YouTube videos as well)
Want to see how your site is doing across all social media platforms? Use http://socialsiteexplorer.com/ (Vince Blackham, 97th Floor)
Perfect Pinterest “Instructographic”: You need a large title, step by step instructions, and use up to 5,000 pixels. (Vince Blackham, 97th Floor)

WordPress Tips

Don’t use the default username “Admin” when setting up a WordPress site–that lets hackers get one step closer to getting into your WordPress installation (Michael David and Matthew Bey of TastyPlacement).
Never store a database backup on your server. Your WordPress database contains an encrypted version of your password that can easily be hacked
You can easily block bad bots with your .htaccess file (all WordPress installations have this file already), full tutorial here.

Use the Yoast SEO plugin to control your Open Graph and Twitter Card markup.
Keep your WordPress index clean by no-following your category and tag pages.
Yoast SEO’s title tag features are unreliable on some themes. Always double check the source code of the header to make sure it’s correct.
Don’t bang on the post publish button, since that repeats the ping.
You know what’s pimpin’? Blog post thumbnails. Do ‘em up right!
Link Building
When the link is epic, like from a college professor, don’t worry about whether the link is perfectly within your niche (Jim Boykin).
Internet Marketing Ninjas have a soon to be publicized tool which will automatically categorize your backlinks.
When using the link disavowal tool, start off by disavowing the nofollowed and dead links. (Bill Hartzer, Standing Dog Interactive)
Repetitive anchor text might not be a problem if it’s for a non-valuable keyword. (Jim Boykin)

Local Maps and Marketing Tips

If you are having trouble verifying a merger between Google+ and your Google local page, initiate the postcard for the 2nd time after 15 days elapses and it’ll be verified automatically…(Greg Gifford, AutoRevo)

Greg Gifford really took this to the next level

To rank better locally, try having more local content. Blog about local events and your community. (Brian Combs, CEO, ionadas local LLC)
To distinguish between a local campaign and your national rankings, try listing a 1-800 number sitewide, and your local phone numbers on individual location pages.
The merging of Google+ for Local and Google Places is inevitable, but it’s not yet helpful to merge those two accounts manually. (Brian Combs, CEO, ionadas local LLC)
Local keyword optimization for your website is vital for the rankings of your local listings.
It’s still a matter for debate whether it’s better to have your multiple location addresses on separate pages. (Brian Combs, CEO, ionadas local LLC)
Geo-tagging can help improve your PlaceRank. (Brian Combs, CEO, ionadas local LLC)
A Google My Map may help your local ranking.
Make your site a genuinely useful local resource directory.
Is your location too remote? Try a virtual office closer to downtown. (Brian Combs, CEO, ionadas local LLC)
You want an awesome local strategy? Emulate Obama’s. (Kevin Adams, smbSEO LLC)
Manage multiple listings in Google Places. Manage brands through G+ for local.
Your places description is seen by a lot of customers, so it needs to be copy that sells, not just keywords and location. (Brian Combs, CEO, ionadas local LLC)
Info not updating on Google? Poke your listing. Show that you are active. Don’t make any new changes, and click submit. (Greg GIfford, AutoRevo)
At least one listing category needs to be a default, the rest can be keywords. (Brian Combs, CEO, ionadas local LLC)

Miscellaneous Tips

With an enhanced Adwords campaign you can put phone numbers in an ad, as well as a Google supplied tracking phone number. (Kevin Adams, smbSEO LLC)
Rel = Publisher ties your Google Business+ page to your whole site, while Rel = Author ties your personal Google+ page to individual articles (Ann Smarty, Internet Marketing Ninjas)
Through rel=’author’, author headshots are connected with articles in SERPS. When searching for a brand, rel=’publisher’ pulls up your G+ summary page for that particular brand (in the knowledge graph area). (Ann Smarty, Internet Marketing Ninjas)
If you are an author churning out content that doesn’t get links, you are not an influencer. Consistent & valuable information is key. (Bonnie Stefanick, Internet Marketing Ninjas)
Power is moving away from pagerank/individual pages to the author. (Bill Rice, Kaleidico)
As of 2012, only 9% of tech blogs have implemented rel=’author’. Which means that Google has very little data for this right now but their plans have not been abandoned (in fact, it has been an ongoing project since 2007).
You might not know this, but you may be implementing rel=’publisher’. Google is actively implementing it through the G+ button, by default. (Ann Smarty, Internet Marketing Ninjas)
Host content on your website to grow your domain & presence, then link to your Google+ page. (Bonnie Stefanick, Internet Marketing Ninjas)
It takes time to build your authorship, but you can lose it very fast. When you are claiming your article Google associates it with you, therefore building your profile. Only claim what you are comfortable with to be safe. (Ann Smarty, Internet Marketing Ninjas)
Panda vs. Penguin: Panda focuses on sites providing bad user experience & low quality content while Penguin looks for unusual linking patterns, keyword stuffing & over optimization. (Bill Hartzer, Standing Dog Interactive)
Got an unnatural link warning in Webmaster Tools? The process for recovering is to: clean up links, disavow (Google Disavow Tool), & request reinclusion. (Bill Hartzer, Standing Dog Interactive)
You can fit 110+ characters in Title Tags. Measure Title Tag length in pixels, not characters. http://bit.ly/titletaglength (Bill Hartzer, Standing Dog Interactive)
Use all available Analytics & Insight tools. They are free!
Keep your agency engaged in the industry: hold a couple of lunches, several webinars and events a month, and attend a conference a quarter. (James Loomstein)
Google Pagespeed Insights: if your results are less than 70, you are not going to rank very well. (Aaron Shear, CEO, Boost Search Marketing)
Gain more real estate with your listing by adding Rich Snippets. It’s simple – you only add one line of code to your site. (Aaron Shear, CEO, Boost Search Marketing)
For phone support for local verification with Google places or Google+ Local), go to: bit.ly/localphonehelp Go through the steps, click on “Call Us” and a Google employee will call you directly. (Greg Gifford, AutoRevo)
Author rank is a leg up opportunity for a small business. (Nathaniel Broughton)
Having problems with Google + or Local listings updates, and are an AdWords user? Call the AdWords Rep, say you are pulling your Ads…the problem will be fixed fast. (Greg GIfford, AutoRevo)
The authority/reputation of reviewers is important. Having an active Google account is something Google is paying attention to when looking at reviews. (Greg GIfford, AutoRevo)
Keep up with the industry by spending time every day on Google Reader, catching up with the literature. (James Loomstein)
Keep tabs on your clients by setting up Google Alerts for their brand name, industry keywords, competitors, and their employee names. (James Loomstein)
No one to call at Google? Visit this forum: productforums.google.com/d/forum/business (Greg GIfford, AutoRevo)
If your Google+ business email doesn’t match your Google+ Local email, you can’t respond to reviews. (Greg GIfford, AutoRevo)
Never REPOST a good review off your google+ page, exact duplicates will be removed. (Greg Gifford, AutoRevo)
Location-based keywords in good reviews can bump you up in local searches for that particular location. Same goes for product/service keywords (ie. the ‘best widget’). This should all happen organically.
Make sure landing pages are mobile-optimized. Create Mobile Ads with Call To Action buttons. (Kevin Adams, smbSEO LLC)
Get your site, business ready for Google’s Knowledge Graph, list your website/business in Freebase.com, some data is believed to feed into Knowledge Graph. (Bill Hartzman of Tandem Interactive).
Think of social ranking factors as the Three Amigos: 1. The number of plus ones (velocity of and authority of +1 user) 2. The number of shares on Google+ 3. The CTR from search results (Greg Gifford, AutoRevo)

Conversion Science Tips

Your goal in ecommerce is to increase your conversion rate without decreasing average order size/value (AOV). Solution? Optimize for revenue per click (RPC), not conversion rate (Brian Massey, the conversion scientist).
When representing small businesses, it’s important for a marketing agency to communicate that the campaign never ends. (Nathaniel Broughton)

February 26, 2013/by Michael David

How to Diagnose a Google Penalty

SEO Power Tools

How to Diagnose a Google Ranking Ban, Penalty, or Filter

The following is an excerpt (with some recent modifications and editorial comments) from our book WordPress Search Engine Optimization (now in second edition!). You can buy the book at Amazon.

If you undertake black or gray hat techniques, you run a fair chance of having your site penalized in the search results. But even if you are not engaged in these techniques yourself, your site may be punished for associating with black hat purveyors. Hosting on a shared server or sharing domain registration information with bad neighborhoods can lead to to ranking problems, if not punishment. Certainly linking to a bad neighborhood can lead to discipline. If you purchase a domain, you’ll inherit any penalties or bans imposed on the prior version of the website.

There are a wide range of penalties and ranking filters that search engines impose and a still-wider range of effects that those penalties produce. In diagnosing and correcting ranking problems, more than half the battle is figuring which penalty, if any, is imposed and for what violations. Ranking problems are easy to fix but arduous to diagnose with precision. Sudden drops in rankings might lead you to suspect that you’ve received a penalty, but it might not be a penalty at all.

In the following section we’ll look at some specific penalties, filters, conditions, and false conditions, and how to diagnose ranking problems.

Google Ban

The worst punishment that Google serves upon webmasters in a total ban. This means the removal of all pages on a given domain from Google’s index. A ban is not always a punishment: Google “may temporarily or permanently remove sites from its index and search results if it believes it is obligated to do so by law.” Google warns that punishment bans can be meted out for “certain actions such as cloaking, writing text in such a way that it can be seen by search engines but not by users, or setting up pages/links with the sole purpose of fooling search engines may result in removal from our index.”

One of the most newsworthy instances of a total ban was when Google, in 2006, issued a total ban to the German website of carmaker BMW (http://www.bmw.de). The offense? Cloaked doorway pages stuffed with keywords that were shown only to search engines, and not to human visitors. The incident became international news, ignited at least partially by the SEO blogging community. BMW immediately removed the offending pages and within a few weeks, Google rescinded the ban.

How to Diagnose a Total or Partial Ban

To diagnose a full or partial ban penalty, run the following tests and exercises:

Check Google’s index. In the Google search field, enter the following specialized search query: “site:yourdomain.com.” Google then returns a list of all of your site’s pages that appear in Google’s index. If your site was formerly indexed and now the pages are removed, there is at least a possibility that your site has been banned from Google.
Check if Google has blacklisted your site as unsafe for browsing (type http://www.google.com/safebrowsing/diagnostic?site=mysite.com with your domain at the end).
Check for Nofollow/Noindex settings. It might seem obvious, but check to make sure you haven’t accidentally set your WordPress site to Noindex. To check, go to your WordPress Dashboard and click the “Privacy” option under “Settings.” If the second setting, “I would like to block search engines, but allow normal visitors” is set, then your site will promptly fall out of the index. A stray entry in a robots.txt file or in your WordPress template file can instruct search engines not to index your entire site.
Check Google Webmaster Tools. Sometimes, but not always, Google will notify you through your Webmaster Tools account that your site has been penalized. But you won’t always receive this message, so you can still be penalized even if you don’t receive it. See the image below for an example message.

Google Webmaster Tools penalty message. In this example, the message notes, “we detected hidden text….”

PageRank Adjustment/PageRank Penalty

An alternative penalty short of an outright ban is a PageRank adjustment. The adjustment can be partial (a drop from a PR4 to a PR2) or can be full (a drop to PR0). With a PageRank adjustment, Google simply adjusts or removes the PageRank value for a site. Google often imposes this punishment upon low-value general directories that sell links. Part of the difficulty with diagnosing and repairing a PageRank penalty is that the PageRank that Google shows to users is historical, sometimes six months pass between PageRank updates.

How to Diagnose a PageRank Penalty

To diagnose a Google PageRank penalty, run the following tests and exercises:

Check your inbound links. Whenever your PageRank drops, the most likely reason is that you’ve lost valuable links. Check your link profile in Yahoo Site Explorer. Have you lost any premium, high-PR links you had formerly? Use the reliability of the PageRank algorithm to help diagnose: if you have a PR4 link pointing into one of your pages, and that PR4 link has only one outbound link, that one link alone will be strong enough to make the destination page a PR1 or a PR2. If despite such a link your page remains a PR0, that raises the likelihood of a PageRank penalty.
Check all pages. Be sure to check every page on your site, you might just have your PageRank shifting around within your site. It is true, however, that generally your home page will have the highest PageRank value of any page of your site. So, if you’ve got a PR0 on all pages including the homepage, a PageRank penalty is suspect.
Check canonicalization. Recall the “www” and “non-www” distinction and that search engines see these as separate domains in some cases. WordPress handles this automatically, but some online tools don’t check this for you so you have to be sure your are checking both the www and non-www versions of your domain.
Compare PageRank. Compare Google’s reported PageRank score for your pages with SEOmoz’ mozRank. Typically, these two scores will correlate loosely (within about 10%). If the Google score is much lower than the SEOmoz mozRank score, it’s likely that Google is trimming some PageRank. You can see the SEOmoz Page Rank score with the free SEO Site Tools plugin or by visiting http://www.opensiteexplorer.org/.

Visible evidence of a Google ranking penalty in the SEO Site Tools plugin; all the elements of a ranking penalty are present. The inbound link count is healthy with over 3,500 links pointing to this domain. SEOmoz’ mozRank (erroneously called “Page Rank” in the screenshot) is a healthy 4.41. Nevertheless, Google’s PageRank is a zero. This is clear evidence of a Google PageRank penalty.

Check internal links. In Google Webmaster Tools, Google reveals its profile of internal links on your site. See the figures below for examples of an unhealthy internal link profile, and a healthy link profile. If your site has 100 indexed pages, but Webmaster Tools references only a handful of links, it means that Google is not properly processing your internal links. We need to be careful here because a range of conditions can cause this. It can potentially arise from a PageRank penalty but also from poor internal navigation structure.

This Google Webmaster Tools screenshot shows an unhealthy internal link profile, and is the same site shown in the screenshot just above. This site is a low-value link directory, a likely candidate for a Google PageRank penalty.

This Google Webmaster Tools screenshot shows a healthy link profile. All or nearly all pages on the website are represented on the internal link profile and the numbers of links to each page is relatively constant.

The -950 Ranking Penalty

Google sometimes employs a -950 ranking penalty to individual pages (but not to entire sites) for particular search queries. The -950 penalty means that for a particular search, your page would have 950 positions added to it. So, a term for which you ranked on page one of Google’s search results in position three, you’d now rank on page ninety-five of the search results at position 953. Sound harsh? It is, and Google has made faint references to it as a penalty for over-optimization. Some SEO professionals contend that they have seen the penalty imposed for shady link building practices.

How to Diagnose a -950 Ranking Penalty

Diagnosing a -950 ranking penalty is easy: try search terms for which you formerly ranked (hopefully you noted their exact former position) and follow the search results out to page 95 or 96. Remember that you can always set Google to display 100 results instead of ten by using the advanced search option at Google.com, which is convenient for checking ranking position in the 100s and above.

The -30/-40 Ranking Penalty

Google often serves up another variety of penalty: it’s the -30 or -40 position penalty. This is an often-imposed penalty, and is applied by Google to entire sites, not just particular pages and not just for particular search queries. This penalty is common enough to trip up legitimate webmasters for very minor oversights or offenses. Most signs point to the -30 penalty being applied algorithmically and is “forgivable,” so changing the condition that led to the penalty automatically reverses the penalty. This penalty has historically been imposed upon sites for serving up poor quality content. For example, the penalty has been imposed upon sites that display thin content. Thin content is content that is partially generic, as with an affiliate site repeating common descriptions of products it sells. Low-value directories have also been served this penalty.

How to Diagnose a -30/-40 Penalty

If you suspect that your site has been been hit with a -30/-40 penalty, there is one sure-fire test to determine if you tripped the penalty. Perform a Google search for your domain name, with out the “www” and without the “.com” or “.net” part of the domain. This search, in normal circumstances, should return your site at or near the first position (depending a bit on the competition of that term). If this test yields your site showing up in a position dropped to the 40s or 50s, it is almost certainly is a -30/-40 penalty.

False Positives That Aren’t Penalties

Don’t assume you’ve been penalized by Google just because your rankings drop or because your rankings remain poor for a new site. Ranking positions can jump around naturally, especially just before algorithm updates, when Google updates its search engine rules. You may also have lost one or more valuable inbound links, that can lead to a drop in rankings. You may also be alternating between Google’s personalized search modes. Personalized search is a Google feature that returns results based on your personal browsing habits. So, if you’ve visited your own website in the past few days, Google will return your website near the top of the results, figuring that it’s one of your personal favorites. Personal search is a convenience tool, but it doesn’t return true rankings. To see actual ranking results you need to make sure personalized search is off. To do this, look on any Google search results page in the upper left hand corner for “Personalize Search On.” Click on the link just under it that reads, “Turn it off.”

Google penalties are almost never imposed for no reason at all. Yes, Google imposes penalties on light offenders while more egregious violations go unpunished. While that might not seem fair, it doesn’t change the fact that if you have perfectly complied with Google’s Webmaster Guidelines, you are extremely unlikely to be penalized. If you’ve been penalized, there’s a reason.

November 3, 2010/by Michael David

SEO Power Tip: Don’t Park Your Extra Domains

SEO Power Tools

SEO Power Tip: Don’t Park Your Extra Domains

If you are like most people either in business on the web or in business generally, you may have accumulated some extra domain names. And, like most people, you might tend to leave those domain names in a dormant or parked status. GoDaddy is a perfect example: they offer free domain parking for domains purchased through their service.

Domain Parking: Almost a Scam

If you use domain parking services, it really isn’t a service at all– the parking service places ads on your domain and does not share of the ad revenue with you–I wonder how much revenue GoDaddy earns in the aggregate from the 100s of thousands, or millions, of domain names parked there. Also, parked domains in most cases will not be indexed by search engines. So, in the eyes of a search engine, a parked domain doesn’t exist.

Best Approach for Parked Domains

The better approach is to “park” your domain by yourself. Set up a very simple page, or better a few pages with some text that is original and contextually relevant to the domain name itself. You can even place some Google Adsense Advertisements on the site to earn a few dollars a month. Meanwhile, the search engines will crawl and index your site because of the original content they find there. Then, when you finally go live with your domain, you’ll have an indexing history—old sites with some age and history always outrank brand-new sites.

An extra tip: always leave your contact information easily findable on a parked domain– you never know when someone will want to offer money for it.

June 15, 2010/by Michael David

SEO Power Tool: Test Your Website’s Load Time

SEO Power Tools

If you haven’t tested your website’s load time–do it now, because Google is going to test it for you.

Google has confirmed long ago that webpage load times will be used as a factor in the calculation of its AdWords Quality Score index–meaning that a faster load time can save you money on your AdWords budget.

You can use the tool for free at Pingdom. Here’s a screenshot of what you’ll see.

What’s a good load time? You want the last item on the page (the bottom of the list) to load in under 5 seconds.

April 28, 2010/by Michael David

Find Duplicate Content Using Free Tools

SEO Power Tools

A very essential part of keeping up a website is to make sure it is free from duplicate content. The quality and uniqueness of the content in a particular website can play a big role in its popularity. At present, there are many tools available on the Internet that anyone can use for free to check for duplicate content.

One of the popular sites used by many is Copyscape. It is a free tool that allows you to post your text content on a box and in return, the tool would check for other websites with similar texts.

Another free tool that can be used is the Xenu Link Sleuth. This is a tool that can be downloaded to check for broken links. This tool often yields title tags, formats, sizes and URLs that can be exported to excel files and these can be sorted to check for duplicates. A similar tool works this way and it is called Yahoo Explorer. The only difference is Yahoo Explorer does not detect broken links.

Google also offers tools to check for duplicate content. The Google Webmaster Tools can be accessed through the main Google page (google.com/webmasters). One has to go to the Diagnostics page and choose HTML suggestions. Click on duplicate title tag to proceed to the download link for the table. Apart from these free tools, there are also a lot of reliable paid duplicate checking tools available online.

There are many types of duplicate contents that one needs to be aware of. Various elements of a website need to be considered as there are millions of websites in the World Wide Web.

Here are some of the elements that need to be checked for duplication:

1. Title Tags. Many websites out there tend to use the title tag over and over again throughout the entire website. This is a form of duplication and needs to be avoided. Plus, with the millions of websites published on the web, title tags often get duplicated.

2. Dynamic URLs. Since the content of dynamic pages changes depending on the data base driving the results of the site, the possibilities of duplicate contents are greater.

3. Meta descriptions. Subjects of the content of a website need meta descriptions, the summary of the content of the page. Oftentimes, because the topics discussed in a website have many similarities with other web pages, meta descriptions also tend to get duplicated.

4. Product descriptions. Websites of product resellers often get their product descriptions from the original manufacturer of the goods. Since there could be many resellers for the same products online, the descriptions are often duplicated.

For any SEO professionals, making sure that the websites they are marketing is unique and free from duplicates help a lot. The popularity and ranking of the site is often at stake. Of course, there are also the copyright infringement issues that are pretty common in the World Wide Web at the moment. Make sure that you check your contents for any duplication.

January 10, 2010/by Claire J. Dunn

How to Run Screaming…

…Frog in Google Compute Engine

Setting Up Google Compute Engine

Installing Screaming Frog in Compute Engine

Screaming Frog Deep Crawl Settings

Starting and Stopping the Instance

Integrating with Cloud Storage and Bigquery

Introduction (Video Below)

The following is a transcription of the video that appears below

What You’ll Need

Let’s do the Screaming Frog tutorial on how to create a sitemap

Create the XML Sitemap: Step 1

After Screaming Frog Completes the Website Crawl

Generating the Sitemap

Here’s the Code:

Social Media Marketing Tips

WordPress Tips

Local Maps and Marketing Tips

Miscellaneous Tips

Conversion Science Tips

How to Diagnose a Google Ranking Ban, Penalty, or Filter

Google Ban

How to Diagnose a Total or Partial Ban

PageRank Adjustment/PageRank Penalty

How to Diagnose a PageRank Penalty

The -950 Ranking Penalty

How to Diagnose a -950 Ranking Penalty

The -30/-40 Ranking Penalty

How to Diagnose a -30/-40 Penalty

False Positives That Aren’t Penalties

SEO Power Tip: Don’t Park Your Extra Domains

Domain Parking: Almost a Scam

Best Approach for Parked Domains

Our Most Popular Services

Let’s Talk: How to Get in Touch With Us