You know all about search engine optimization. The importance of a well-structured site, relevant keywords, appropriate tagging, technical standards, and lots and lots of content. But chances are you don’t think a lot about Google crawl optimization.
Googlebot optimization isn’t the same thing as search engine optimization, because it goes a level deeper. Search engine optimization is focused more upon the process of optimizing for user’s queries. Googlebot optimization is focused on how Google’s crawler accesses your site.
There’s a lot of overlap, of course. However, I want to make this important distinction, because there are foundational ways in which it can affect your site. In conclusion, a site crawl effectivity is the important first step to ensuring its searchability.
The Concept Of Crawl Depth
Crawl depth is the extent to which a search engine indexes pages within a website. Most sites contain multiple pages, which in turn can contain subpages. The pages and subpages grow deeper in a manner similar to the way folders and subfolders (or directories and subdirectories) grow deeper in computer storage.
In general, the further down in the Web site hierarchy a particular page appears, the smaller the chance that it will appear with a high rank in a search engine results page (SERP). A Web site’s home page has a crawl depth of 0 by default. Pages in the same site that are linked directly (with one click) from within the home page have a crawl depth of 1; pages that are linked directly from within crawl-depth-1 pages have a crawl depth of 2, and so on.
A crawler — also known as a spider or bot — is a program that visits websites and reads their pages and other information in order to create entries for a search engine index.
Crawl Efficiency + .XML Sitemaps
Your site should have one or more XML sitemaps. Those XML sitemaps tell Google which URLs exist on your site. A good XML sitemap also indicates when you’ve last updated a particular URL. Most search engines will crawl URLs in your XML sitemap more often than others.
In Google Search Console, XML sitemaps give you an added benefit. For every sitemap, Google will show you errors and warnings. You can use this by making different XML sitemaps for different types of URLs. This means you can see what types of URLs on your site have the most issues.
Crawl Efficiency Common Errors
1. Too Much 301 Redirects Usage
I was recently consulting on a site that had just done a domain migration. The site is big, so I used one of our tools to run a full crawl of the site and see what we should fix. It became clear we had one big issue. A large group of URLs on this site is always linked to without a trailing slash. If you go to such a URL without the trailing slash, you’re 301 redirected. You’re redirected to the version with the trailing slash.
If that’s an issue for one or two URLs on your site it doesn’t really matter. It’s actually often an issue with homepages. If that’s an issue with 250,000 URLs on your site, it becomes a bigger issue. Googlebot doesn’t have to crawl 250,000 URLs but 500,000. That’s not exactly efficient.
This is why you should always try to update links within your site when you change URLs. If you don’t, you’ll get more and more 301 redirects over time. This will slow down your crawl and your users. Most systems take up to a second to server a redirect. That adds another second onto your page load time.
2. 404 and More Issues
While it crawls your site, Google will encounter errors. It’ll usually just pick the next page from the pile when it does. If you have a lot of errors on your site during a crawl, Googlebot will slow down. It does that because it’s afraid that it’s causing the errors by crawling too fast. To prevent Googlebot from slowing down, you thus want to fix as many errors as you can.
Google reports all those errors to you in its Webmaster Tools, as do Bing and Yandex. We’ve covered errors in Google Search Console (GSC) and Bing Webmaster Tools before. If you have our Yoast SEO Premium plugin, you can import and fix the errors from GSC with it. You can do that straight from your WordPress admin.
You wouldn’t be the first client we see that has 3,000 actual URLs and 20,000 errors in GSC. Don’t let your site become that site. Fix those errors on a regular basis, at least every month.
3. Hidden Problems
If your site is somewhat more authoritative in Google’s eyes, fun things can happen. Even when it’s clear that a link doesn’t make sense, Google will crawl it. Give Google the virtual equivalent of an infinite spiral staircase, it’ll keep going. I want to share a hilarious example of this I encountered at the Guardian.
At the Guardian, we used to have daily archives for all our main categories. As the Guardian publishes a lot of content, those daily archives make sense. You could click back from today, to yesterday and so on. And on. And on. Even too long before the Guardian’s existence. You could get to December 25th of the year 0 if you were so inclined. We’ve seen Google index back to the year 1,600. That’s almost 150,000 clicks deep.
This is what we call a “spider trap“. Traps like these can make search engines crawl extremely inefficient. Fixing them almost always leads to better results in organic search. The bigger your site gets, the harder issues like these are to find. This is true even for experienced SEOs.
Identify Crawl Errors and Fix
If you’re intrigued by this and want to test your own site, you’re going to need some tools. We used Screaming Frog a lot during our site reviews. It’s the Swiss army knife of most SEOs. Some other SEOs I know swear by Xenu, which is also pretty good (and free). Be aware: these are not “simple” tools. They are powerful tools that can even take down a site when used wrong, so take care.
A good first step is to start crawling a site and filter for HTML pages. Then sort descending by HTTP status code. You’ll see 500 – 400 – 300 type responses on the top of the list. You’ll be able to see how bad your site is doing, compared to the total number of URLs. See an example below:
I’d love to hear if you’ve had particular issues like these with crawl efficiency and how you solved them. Even better if this post helped you fix something, come tell us below!