What are Crawlability and Indexability for SEO? To understand those terms, let’s look at how search engines discover and index pages. To learn about any new (or updated) page, they use what’s known as web crawlers, bots whose aim is to follow links on the web with a single goal in mind:
Ranking in search engines requires a website with flawless technical SEO. Luckily, we have plugins to take care of (almost) everything on your WordPress site. Still, if you want to get the most out of your website and keep on outranking the competition, some basic knowledge of technical SEO is a must. In this post, I’ll explain one of the most important concepts of technical SEO: crawlability.
A search engine like Google consists of a crawler, an index, and an algorithm. The crawler follows the links. When Google’s crawler finds your website, it’ll read it, and its content is saved in the index.
A crawler follows the links on the web. A crawler is also called a robot, a bot, or a spider. It goes around the internet 24/7. Once it comes to a website, it saves the HTML version of a page in a gigantic database called the index.
This index is updated every time the crawler comes around your website and finds a new or revised version. Depending on how important Google deems your site and the number of changes you make on your website, the crawler comes around more or less often.
Crawlability and Indexability for SEO?
Crawlability has to do with the possibilities Google has to crawl your website. Crawlers can be blocked from your site. There are a few ways to block a crawler from your website. If your website or a page is blocked, you’re saying to Google’s crawler: “do not come here”. In most cases, your site or the respective page won’t turn up in the search results.
There are a few things that could prevent Google from crawling (or indexing) your website:
- If your robots.txt file blocks the crawler, Google will not come to your website or specific web page.
- Before crawling your website, the crawler will look at your page’s HTTP header. This HTTP header contains a status code. If this status code says that a page doesn’t exist, Google won’t crawl your website. In the module about HTTP headers of our Technical SEO training, we’ll tell you all about that.
- If the robot’s meta tag on a specific page blocks the search engine from indexing that page, Google will crawl that page but won’t add it to its index. So part of crawlability and indexability for SEO.
This flow chart might help you understand the process bots follow when attempting to index a page:
What affects Crawlability and Indexability?
1. Page structure
The information structure of a website plays a crucial role in its crawlability. For example, if your website contains pages that are not linked anywhere else, web crawlers may not be able to access them.
Of course, if someone points that out in their content, they can still find those pages via external links. But in general, weak structures can cause crawling issues.
2. Internal link structure
Web crawlers navigate the web by following links, just like you would on any website. Therefore, it can only find pages you link to from other content.
Therefore, a good internal link structure can quickly reach those pages deep in the website. However, the poor structure can lead to dead ends, causing web crawlers to lose some of your content.
3. Loop redirection
Broken page redirects can prevent web crawlers from running in their tracks, causing crawlability issues.
4. Server error difficult crawlability and indexability for SEO
Likewise, lousy server redirects and many other server-related issues can prevent web crawlers from accessing all of your content.
5. Unsupported scripts and other technical factors
Crawlability issues can also arise due to the technology you use on your website. For example, gating content behind a form creates crawlability problems because crawlers cannot follow the form. Scripts such as Javascript or Ajax can also block content from web crawlers.
6. Prevent web crawlers from accessing crawlability and indexability for SEO
Finally, you can intentionally prevent web crawlers from indexing pages on your site. And for a good reason.
For example, you might create a page to which you want to restrict public access. To block this access, you should also stop it from search engines.
However, it’s also easy to accidentally block other sites. For example, a simple error in the code could block an entire website section.
Wants More?
Although crawlability is just the basics of technical SEO (it has to do with all the things that enable Google to index your site), it’s already pretty advanced stuff for most people nevertheless, if you’re blocking. Perhaps even without knowing! Crawlers from your site, you’ll never rank high in Google. So, if you’re serious about SEO, this should matter to you.
To understand all the technical aspects of crawlability, you should check out our Technical SEO 1 training. In this SEO course, we’ll teach you how to detect technical SEO issues and how to solve them (with Yoast SEO plugin).