Robots.txt

What is Robots.txt

Robots.txt (also known as the robots exclusion protocol or standard) is used to allows a website to provide instructions to web crawling bots.

What is inside Robots.txt

The robots.txt file contains a set of instructions that request the bot to ignore specific files or directories.

What is the Purpose

Search engines use web crawlers (web robots) to archive and categorize websites. Mosts bots are configured to search for a robots.txt file on the server before it reads any other file from the website. It does this to see if a website’s owner has some special instructions on how to crawl and index their site.

Basically, one of the major goals of SEO is to get search engines to crawl your site easily so they increase your ranking. If a search engine crawls your site, it will crawl every single one of your pages. And if you have a lot of pages, it will take the search engine bot a while to crawl them, which can have negative effects on your ranking. So, If you create the right robots.txt page, you can tell search engine bots to avoid certain pages. If you tell search engine bots to only crawl your most useful content, the bots will crawl and index your site based on that content alone.

Notes

If a website has more than one subdomain, each subdomain must have its own robots.txt file. It is important to note that not all bots will honor a robots.txt file. Some malicious bots will even read the robots.txt file to find which files and directories they should target first. Also, even if a robots.txt file instructs bots to ignore a specific pages on the site, those pages may still appear in search results of they are linked to by other pages that are crawled.

Example

Let’s say a search engine is about to visit a site. Before it visits the target page, it will check the robots.txt for instructions.

Suppose the site have the following in the robot.txt file:

User-agent: *
Disallow: /
  • User-agent: * – means that the robots.txt file applies to all web robots that visit the site.
  • Disallow:/ – tells the robot not to visit any pages on the site

More about robot.txt 

 

 

About the Author: Md. Delwar Hossain

He has 11 years of experience in developing standalone software and web applications for multiple database platforms. He has been passionate about new tools and technologies. He is positive and trustworthy. He is capable to learn and adapt quickly to different situations. He is a great team player and enjoys leading and mentoring. He is specialized in architecting and building complex web and mobile application. He has strong skills to automate POS, inventory, supply chain, trading export/ import, human resource management, manufacturing and production, distribution management system and hospital management system.

Leave a Reply

Your email address will not be published. Required fields are marked *