How to Optimize Google’s Crawl Budget for SEO Performance
Noble StudiosOctober 11, 2018
Over the past few years, we’ve received more and more questions from clients regarding their crawl budget. Questions have ranged from the basic “What is a crawl budget?” to “How are you maximizing our crawl budget?” to “Should we buy product x to monitor our crawl budget?” This is not just a trend with our clients; you can see from the chart below that search volume in the U.S. for the term “crawl budget” has steadily increased over this time frame.
Searches on Google for the term “crawl budget” since 2015. Source: Google Trends.
Despite this recent trend, the concept of a crawl budget is not new. Google’s Former Head of Web Spam Matt Cutts was interviewed on the topic as early as 2010. At the time, he correlated the crawl budget as being proportionate to the PageRank of a website. Although PageRank no longer carries the importance it used to, the suggestion that the best content on the most authoritative sites gets crawled most frequently is still a reality. That makes sense in concept, but what can a digital marketer actually do to optimize their crawl budget?
Before we get into that, let’s more clearly define what “crawl budget” refers to.
What is Crawl Budget and What Does it Mean for Digital Marketers?
Google says your website’s crawl budget is made up of two specific variables: crawl demand and crawl rate. They also reiterate that large sites with many pages and products should be most concerned with maximizing their crawl budget.
According to Gary Illyes, a Google Webmaster trends analyst, “Prioritizing what to crawl, when, and how much resource the server hosting the site can allocate to crawling is more important for bigger sites, or those that auto-generate pages based on URL parameters.”
So, at the highest level:
Crawl rate limit is designed to help Google not crawl your pages too much and too fast to the point where it hurts your server’s performance.
Crawl demand is how much Google wants to crawl your pages. This is based on how popular your pages are and how stale the content is in the Google index.
Crawl budget is “taking crawl rate and crawl demand together.” Google defines crawl budget as “the number of URLs Googlebot can and wants to crawl.”
More on Crawl Rate Limit
Google wants to be a good citizen of the web. Crawling and indexing high quality content that can be matched with user intent is their main priority. They must do this while also making sure their crawling efforts don’t degrade the experience of users visiting these websites. Google calls this the "crawl rate limit," which limits the maximum fetching rate for a given site.
Simply put, this represents the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches. The crawl rate can go up and down based on a couple of factors:
Crawl health: if the site consistently responds really quickly, the limit goes up, meaning more connections can be used to crawl. If the site slows down or responds with server errors, the limit goes down and Googlebot crawls less.
In Search Console, marketers can manually reduce Googlebot's crawling limit for their website. But it’s important to note setting higher limits doesn't automatically increase crawl rate.
Crawl Rate as reported by Google Search Console
More on Crawl Demand
Even if the crawl rate limit isn't reached, if there's no demand for indexing, there will be low activity from Googlebot. The two factors that play a significant role in determining your crawl demand are:
Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresh in Google’s index.
Staleness: Google’s systems attempt to prevent URLs from becoming stale in the index.
Site-wide events, like site migrations, may temporarily trigger an increase in crawl demand in order to re-index the content under the new URLs.
As stated earlier, bringing crawl rate and crawl demand together, we can define a crawl budget as the number of URLs Googlebot can and wants to crawl.
Optimizing Your Crawl Budget
The internet is a really big place and new content is being created all the time. Google has a finite number of resources, so when faced with the near-infinite quantity of content that's available online, Googlebot is only able to find and crawl a percentage of the content that exists. So, what can you do to both increase your crawl budget and ensure you are maximizing the crawl budget you do have, so that the right pages are being crawled frequently?
Here are five ways you can start optimizing your crawl budget today:
1. Speed Up Your Website
You already know that site speed is crucial for SEO, especially when considering the growth of mobile search and higher performance expectations from your customers. Speed is also critical for your crawl budget.
It makes sense that if your site is faster and performs better overall, it will be able to handle more requests from Googlebot and human users at the same time. This allows Google to increase the crawl rate of your site without a high risk to the user experience of your customers. Especially important here is Time to First Byte (TTFB).
TTFB is the time spent waiting for the initial server response. This time captures the latency of a round trip to the server in addition to the time spent waiting for the server to deliver the response.
Make sure you are giving Google a signal that you have the server capacity to handle additional crawl requests.
2. Reduce Website Errors
This is another big one. We’ve already stated that Google has finite resources and needs to crawl and index a nearly infinite quantity of content. This means efficiency is of critical importance. The time and resources they have allocated to Googlebot must deliver the highest quantity of crawled pages as possible. Therefore, anything that slows down Googlebot’s performance, such as website errors, will negatively impact your crawl demand.
Common website errors include page not found errors (404 errors), server errors (500 errors) and duplicate content. You can leverage reports in Google Search Console or within comprehensive tools like BrightEdge to run an analysis of your website. Find these errors and correct them.
A great place to start is analyzing your site map. This is a list of pages that you are proactively asking Google to crawl and index. Are there errors on any of these pages? Multiple errors within a site map is a quick way to make Google abandon a crawl. We’ve seen over a 100 percent increase in the number of pages crawled by simply removing 404 errors from site maps.
Duplicate content is another big issue. Make sure that every page on your website offer unique and specific content addressing the intent of your customers. If you have pages with very similar content, are they both needed? Can you delete one? Also, make sure you are leveraging canonical tags in your code and letting Google know which specific page should be considered authoritative in the event that duplicate content cannot be avoided.
3. Block Parts of Your Site
Another way to ensure that your crawl budget is used efficiently is to make sure that Googlebot is not crawling pages of the website that you don’t want indexed. Leverage your Robots.txt file to block Google from crawling administrative sections of your website or areas that are gated and require a user login. This will avoid a server error when Google tries to crawl those pages, and also allow your crawl budget to be focused on pages you want to be seen.
4. Clean Up Your Redirects
This is a mistake we see all the time, especially with websites that have been around a while, or a have been migrated from HTTP to HTTPS. A 301 redirect is a great tool to let Google know that a page has moved to a new location. By using the technique, you're able to transfer a portion of the authority earned by the previous page to the new one. The challenge is when a page gets redirected multiple times before landing on the current page. Let’s look at an example:
http://www.domain.com/page-1 > http://www.domain.com/page-2 >; https:www.domain.com/page-2
In the example above, you can see that page 1 on the HTTP site has moved to page 2. Later, the company launched an HTTPS (secure) version of the site. Rather than updating their original redirect to go straight to the new HTTPS version of page 2, they are still redirecting to the HTTP version of page 2 and then redirecting that page to the HTTPS version of page 2. By doing this, you risk Google abandoning the crawl, or at best, you have diluted the authority being passed to the new page. Just don’t do it. We’ve seen sites with a dozen or more redirects chained together for a single page.
5. Keep Your Content Fresh
While a lot of the crawl budget optimization process focuses on increasing the crawl rate, you can also increase your crawl demand. Update your existing content on a set schedule to show Google that this content is not static, and should in fact be crawled frequently. You can do this by running a monthly report of your top organic landing pages and make sure the content is still timely and relevant to your audience. You can look for ways to expand and/or improve that content. You can also promote this content on social media channels or establish a formal link building program to acquire more high quality links to a specific page. These are all signals that your site has a high demand to be crawled.
We want to reiterate that optimizing your crawl budget is not going to happen overnight and should not be viewed as a one-off project. These efforts should be part of your overall SEO strategy and weaved into your daily/weekly/monthly efforts. A great place to start is by including data on your crawl budget performance into your SEO dashboard and reviewing it on a monthly basis with your team. This way you are ensuring it is a continuous focal point.
Interested in learning more about crawl budget? Watch this webinar hosted by BrightEdge featuring our VP of Performance Marketing Chad Hallert!