Skip to main content

Site Owner Guide · Crawling Controls

Introduction to robots.txt

A robots.txt file tells crawlers which URLs they can access on your site. It’s mainly used to manage crawl traffic—not to keep a web page out of Google.

Educational Resource: This comprehensive guide helps website owners understand robots.txt and SEO best practices.

What is a robots.txt file used for?

A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.

To keep a web page out of Google Search results, block indexing with noindex (meta tag or response header) or password‑protect the page.

If you use a CMS

If you use a CMS (for example, Wix or Blogger), you might not need to (or be able to) edit your robots.txt file directly. Instead, your CMS may provide search settings or “page visibility” controls.

If you want to hide or unhide a page from search engines, search your CMS documentation for instructions (for example, “wix hide page from search engines”).

Robots.txt effects by file type

You can use a robots.txt file to manage crawling traffic, and depending on the file type, to help keep certain content out of Google Search results.

File typeWhat robots.txt can do
Web pages (HTML, PDF, other readable text formats)Manage crawling traffic (avoid crawling unimportant/similar pages). Don’t use robots.txt to hide a page: the URL can still be indexed if linked elsewhere, often without a snippet.
Media files (images, video, audio)Manage crawl traffic and help prevent those media URLs from appearing in Google Search results. This won’t stop other sites/users from linking directly to the media.
Resource files (JS, CSS, unimportant images)You can block some resources if the page remains understandable without them. If blocking resources makes rendering or understanding harder for Google, don’t block them.

Important limitations to understand

  • Robots.txt is not a security mechanism: not all crawlers obey it. Use authentication/password protection for private content.
  • Different crawlers may interpret rules differently: keep syntax correct and test with the target crawler where possible.
  • Disallowed URLs can still appear in search results: Google may index a URL if other pages link to it, even if crawling is blocked.
  • Rules can counteract each other: combining crawling and indexing directives incorrectly can cause surprises—review your full setup (robots, noindex, canonicals).

Create or update a robots.txt file

If you decide you need one, you can create or update your robots.txt file at the root of your site (for example, /robots.txt). Keep it simple and include your sitemap locations when relevant.

Recommended reading (Google documentation)


Source: Google Search Central documentation. This page is a condensed reference for convenience; always defer to Google’s documentation for the most current behavior and syntax.

Save $50-$56
Exclusive Discount Available - Limited Time Offer!
Get the same training with $50-$56 savings - Only through our affiliate link