How to Recognize the Indeed.com Robot and Other Unwanted Visitors
The longer you stare at Google Analytics, the more you see things that don’t make sense. Sometimes the irregularities pop right out, users who crawl every single page exactly once in a single session, or users who load up pages that can’t possibly exist a couple times every month. Most likely these abnormalities are caused by robots, those sneaky denizens of the inter-webs, teeming in the millions and doing their masters’ bidding. Mostly the robots don’t execute javascript, so they don’t manifest in Google Analytics. And you can filter out a lot of the robots which do use javacript by clicking the “exclude known bots and spiders” box in your GA view settings.
Why this box wouldn’t be checked by default is anyone’s guess. Between views where you forget to check off that box, and robots which get around Google’s filters, you will see a lot of this fake traffic once you start to look.
A few months ago we noticed one robot in particular which stood out from the others. Instead of hailing from Russia or the Philippines, or some other sketchy location, it seemed to originate from right here in Austin, Texas (also an admittedly sketchy location). The pattern was a single user loading all of the pages on the site one right after the other.
This crawling seems to have started within the last year or so, and the declared user agent was an outdated version of Firefox. Once we made a segment which isolated the stats we found it everywhere. It was like that scene from Independence Day where Jeff Goldblum discovers that there’s an alien signal inside all the telecommunications satellites.
Drilling down to all the dimensions allowed by Google, we found the smoking gun under Audience –> Technology –> Network.
We believe the robot is how Indeed.com gets some of its job listings. It scrapes the entire dang internet to find those sites where the companies can’t be bothered to post directly to Indeed.com.
Here’s the stats for identifying the Indeed bot:
City: Austin or sometimes Denver
Network: indeed inc
Nework domain: cyrusone.com
source/medium: direct/none
Browser: Firefox, 38.0, 1024×768
If Indeed has the hardware and expertise to scour the digital world with a fully rendering bot, you think they could also make it not execute Google Analytics. Just saying.
A Few Other Robots of Note in Google Analytics:
New York Mystery Robot
Appears to be probing sites for security flaws. Deliberately loads 404 pages with mysterious hashes in various subdirectories, such as /blog/aHQtZ3JvdX . It’s declaring the browser to be a year-old version of Chrome. One theory is the hashed 404 pages load up remote shells. Or maybe that’s just being paranoid. Nearly every site has dozens of visits a month.
City: New York
Network: microsoft corporation
Nework domain: unknown.unknown
source/medium: direct/none
Browser: Chrome, 57.0.2987.133, 1280×960
Amazon Bing Bot
Surprisingly pervasive, sometimes visiting every day. Usually loads homepage, but sometimes a gibberish 404 page. It’s really apparent if you visit the Acquisition –> Channels –> Organic page, the keyword “Amazon” shows up right near the top. It’s not clear what they’re trying to do here, there’s no obvious spam message, and they’re not crawling much content.
City: Santa Clara/Anaheim/New York
Nework domain: paloaltonetworks.com, unknown.unknown, keznews.com
source/medium: Bing/Organic: keyword “Amazon”
Browser: Internet Explorer 8, 800×600
Weird Brazil Traffic
It’s probably no surprise that if you’re getting Brazilian traffic on a Texas home-services site, that the Brazilian users are probably not actual prospect customers. A variety of cities in Brazil are the homes to bots which methodically crawl all a site’s content.
City: Itacoatiara (Brazil)
Nework domain: sovereignease.com
source/medium: direct/none
Browser: Chrome, 52.0.2743.116, 1920×1610
City: Santarem
Network Domain: ip-158-69-167.net
Source/Medium: direct/none
Browser: Chrome
City: Maraba
Network Domain: ip-66-70-225.net
Source/Medium: direct/none
Browser: Chrome
Let us know if you see any interesting bots crawling your sites!
Leave a Reply
Want to join the discussion?Feel free to contribute!