A website will not be visible in search engine result pages if it is not indexed by the Google. It is typical that search engines often crawl through websites and index the content to display it in the search results based on their rankings. However, sometimes as a website owner you may want to restrict the access to some of your website pages for the web traffic. Or perhaps, you may prefer to block duplicate content from being indexed by search engines. One such way to manage the accessibility of different webpages on your website is with the use of robots.txt file or robots meta-tag.
What is a robots.txt File?
Robots.txt is a simple text file that tells search engines which website file to index and which pages you would like to avoid. However, it doesn’t restrict search engines from crawling your website. When a search engine crawler visits your website, it basically looks for robots.txt file to get the data on which files to index and which to ignore. For this reason the location of the file is very important to make sure the file is available to user-agents (search engines) at the top level directory of your site. In any case, if the file is not available at the main directory, the search engines will assume that the site has no restriction on indexing their webpages.
Robots.txt file contains several records in its text file with each record having two elements, user-agent and disallow. While ‘user-agent’ indicates which search engines to follow the disallow, the ‘Disallow’ element represents the files and directories to be excluded from indexing.
A typical example of a robots.txt file is :
Since the disallow file is not mentioned in the above example, it allows search engines to crawl all pages of the website.
Robot Meta Tag
Another method is to include a HTML robot meta tag in the source code to provide search engines with instructions whether to crawl, index or present the page in search results.
A typical robot meta tag looks like
<meta name=”robots” content=”noindex”>
Several attributes can be added to the content such as nosnippet, index or noindex, follow or nofollow separated by commas. The combination of different attributes will decide how the search engines will act on the site. In case if no robot meta tag is mentioned in the code, then the default is assumed as “Index, follow”. If you want the whole site to have the same affect, then the robot tag must be applied to the index page. However, if you want to give options at the page level, then the tag should be put in that page.