Home Seo Knowledge Base What is a robots.txt file ?

What is a robots.txt file ?

SeoMasters.pro8 May 202529

A robots.txt file is a text file that helps dictate where web crawlers (such as search engine bots) can and cannot crawl on your website. Before a search engine visits any page on a domain it hasn’t encountered before, it will check that domain’s robots.txt file to determine which URLs it’s allowed to access and which ones it should avoid.

File Format and Location

The robots.txt file must be named exactly “robots.txt” (all lowercase) and must be located at the root of your website. For example, for a website at https://www.example.com, the robots.txt file should be accessible at https://www.example.com/robots.txt.

The file must be UTF-8 encoded plain text. You should create it using a text editor like Notepad, TextEdit, vi, or emacs – not with a word processor that might add proprietary formatting or unexpected characters.

Basic Syntax

A robots.txt file consists of one or more groups of rules. Each group typically includes:

User-agent directive – Specifies which crawler the rules apply to
Disallow/Allow directives – Specify which parts of the site can or cannot be accessed

Example of a Basic robots.txt File:

User-agent: *
Disallow: /private/
Allow: /
Sitemap: https://www.example.com/sitemap.xml

This example means:

User-agent: * – These rules apply to all crawlers
Disallow: /private/ – No crawler should access the /private/ directory or its contents
Allow: / – All other parts of the site can be accessed
Sitemap: https://www.example.com/sitemap.xml – Provides the location of your sitemap

Common Directives

User-agent: Identifies which crawler the rules apply to
- User-agent: * (all crawlers)
- User-agent: Googlebot (Google’s crawler specifically)
- User-agent: Bingbot (Bing’s crawler specifically)
Disallow: Tells crawlers which parts of your site they should not access
- Disallow: / (blocks the entire site)
- Disallow: /admin/ (blocks the /admin/ directory)
- Disallow: /*.pdf$ (blocks all PDF files)
Allow: Tells crawlers which parts they can access (especially useful with wildcards)
- Allow: /public/ (allows access to the /public/ directory)
Sitemap: Indicates the location of your XML sitemap
- Sitemap: https://www.example.com/sitemap.xml

More Complex Examples

Block All Crawlers from Entire Site:

User-agent: *
Disallow: /

Block One Specific Crawler:

User-agent: BadBot
Disallow: /

User-agent: *
Allow: /

Block Multiple Directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

Block File Types:

User-agent: *
Disallow: /*.pdf$
Disallow: /*.xls$
Disallow: /*.doc$

Important Limitations

It’s crucial to understand that robots.txt is primarily for managing crawler traffic to your site, not for hiding pages from search results. Some important limitations:

The protocol relies on voluntary compliance and malicious bots may ignore your robots.txt file or even use it to find disallowed pages.
A page disallowed in robots.txt can still be indexed if linked to from other sites. While Google won’t crawl the content, it might still find and index a disallowed URL if it’s linked from elsewhere on the web.
To prevent a page from appearing in search results, use a meta robots noindex tag instead of blocking it with robots.txt.

Best Practices

Only use robots.txt for files or pages that search engines should never see or that can significantly impact crawling, such as login areas, test areas, or where multiple faceted navigation exists.
Monitor your robots.txt file for any issues or changes, as developers sometimes make changes when pushing new code that could inadvertently alter your robots.txt file.
Test your robots.txt file using tools like Google Search Console to make sure it works as intended.
Be aware that Google caches robots.txt files for up to 24 hours (sometimes longer), so changes may not take effect immediately.
Remember that each subdomain needs its own robots.txt file.

By properly configuring your robots.txt file, you can help search engines crawl your site more efficiently and focus on the content that matters most to your visitors.

Previous post What is a sitemap ?

Next post Do I need to blog for SEO ?

What is a robots.txt file ?

File Format and Location

Basic Syntax

Example of a Basic robots.txt File:

Common Directives

More Complex Examples

Block All Crawlers from Entire Site:

Block One Specific Crawler:

Block Multiple Directories:

Block File Types:

Important Limitations

Best Practices

Categories

Can you guarantee a #1 ranking on Google ?

How much does SEO cost, and is it worth it ?

What are backlinks and why do they matter ?

Do I need to blog for SEO ?

Navigation Menu

Navigation

Menu Service

Services

Legal Information

Legal Information

Archives

Categories