5. Utilize robots.txt.

When a web robot crawls your site, it will first check the /robot.txt, otherwise known as the Robot Exclusion Protocol. This protocol can allow or disallow specific web robots to crawl your site, including specific sections or even pages of your site. If you’d like to prevent bots from indexing your site, you’ll use a noindex robots meta tag. Let’s discuss both of these scenarios.

You may want to block certain bots from crawling your site altogether. Unfortunately, there are some bots out there with malicious intent — bots that will scrape your content or spam your community forums. If you notice this bad behavior, you’ll use your robot.txt to prevent them from entering your website. In this scenario, you can think of robot.txt as your force field from bad bots on the internet.

Regarding indexing, search bots crawl your site to gather clues and find keywords so they can match your web pages with relevant search queries. But, as we’ll discuss later, you have a crawl budget that you don’t want to spend on unnecessary data. So, you may want to exclude pages that don’t help search bots understand what your website is about, for example, a Thank You page from an offer or a login page.

No matter what, your robot.txt protocol will be unique depending on what you’d like to accomplish.

nl_NLDutch