What is Crawlability? To understand those term, let’s start by taking a look at how search engines discover and index pages. To learn about any new (or updated) page, they use what’s known as web crawlers, bots whose aim is to follow links on the web with the single goal in mind:
Ranking in the search engines requires a website with flawless technical SEO. Luckily, we have plugins to take care of (almost) everything on your WordPress site. Still, if you really want to get the most out of your website and keep on outranking the competition, some basic knowledge of technical SEO is a must. In this post, I’ll explain one of the most important concepts of technical SEO: crawlability.
A search engine like Google consists of a crawler, an index, and an algorithm. The crawler follows the links. When Google’s crawler finds your website, it’ll read it and its content is saved in the index.
A crawler follows the links on the web. A crawler is also called a robot, a bot, or a spider. It goes around the internet 24/7. Once it comes to a website, it saves the HTML version of a page in a gigantic database, called the index. This index is updated every time the crawler comes around your website and finds a new or revised version of it. Depending on how important Google deems your site and the number of changes you make on your website, the crawler comes around more or less often.
Crawlability has to do with the possibilities Google has to crawl your website. Crawlers can be blocked from your site. There are a few ways to block a crawler from your website. If your website or a page on your website is blocked, you’re saying to Google’s crawler: “do not come here”. Your site or the respective page won’t turn up in the search results in most of these cases.
There are a few things that could prevent Google from crawling (or indexing) your website:
- If your robots.txt file blocks the crawler, Google will not come to your website or specific web page.
- Before crawling your website, the crawler will take a look at the HTTP header of your page. This HTTP header contains a status code. If this status code says that a page doesn’t exist, Google won’t crawl your website. In the module about HTTP headers of our Technical SEO training, we’ll tell you all about that.
- If the robots meta tag on a specific page blocks the search engine from indexing that page, Google will crawl that page, but won’t add it to its index.
This flow chart might help you understand the process bots follow when attempting to index a page:
Although crawlability is just the very basics of technical SEO (it has to do with all the things that enable Google to index your site), for most people it’s already pretty advanced stuff. Nevertheless, if you’re blocking – perhaps even without knowing! – crawlers from your site, you’ll never rank high in Google. So, if you’re serious about SEO, this should matter to you.
If you really want to understand all the technical aspects concerning crawlability, you should definitely check out our Technical SEO 1 training. In this SEO course, we’ll teach you how to detect technical SEO issues and how to solve them (with Yoast SEO plugin).