What is Robots.txt
Robots.txt (also known as the robots exclusion protocol or standard) is used to allows a website to provide instructions to web crawling bots.
What is inside Robots.txt
The robots.txt file contains a set of instructions that request the bot to ignore specific files or directories.
What is the Purpose
Search engines use web crawlers (web robots) to archive and categorize websites. Mosts bots are configured to search for a robots.txt file on the server before it reads any other file from the website. It does this to see if a website’s owner has some special instructions on how to crawl and index their site.
Basically, one of the major goals of SEO is to get search engines to crawl your site easily so they increase your ranking. If a search engine crawls your site, it will crawl every single one of your pages. And if you have a lot of pages, it will take the search engine bot a while to crawl them, which can have negative effects on your ranking. So, If you create the right robots.txt page, you can tell search engine bots to avoid certain pages. If you tell search engine bots to only crawl your most useful content, the bots will crawl and index your site based on that content alone.
If a website has more than one subdomain, each subdomain must have its own robots.txt file. It is important to note that not all bots will honor a robots.txt file. Some malicious bots will even read the robots.txt file to find which files and directories they should target first. Also, even if a robots.txt file instructs bots to ignore a specific pages on the site, those pages may still appear in search results of they are linked to by other pages that are crawled.
Let’s say a search engine is about to visit a site. Before it visits the target page, it will check the robots.txt for instructions.
Suppose the site have the following in the robot.txt file:
User-agent: * Disallow: /
- User-agent: * – means that the robots.txt file applies to all web robots that visit the site.
- Disallow:/ – tells the robot not to visit any pages on the site