Share This!

Tuesday, January 6, 2009

Don't Overlook Robots.txt

In the good old days, web spiders would crawl your sites once you registered them with a search engine. Today, they are a lot more proactive, crawling sites when the domain names are registered. For this reason, it is not optional during the development phase to add a robots.txt file to all projects that instructs robots not to crawl the site.

It's super easy. Just create a text file in the website root directory named robots.txt. Put the following text in it.

User-agent: *
Disallow: /

That's it. Now your temporary website is safe from most webcrawlers. Note that all subdomains must have one of these in the root path.


There is no need to put a robots.txt in subdirectories or virtual paths, such as

More info on Robots.txt here

More in-depth info here

A tutorial for when you want your site to be crawled.

Please note that without a robots.txt file, a web spider will attempt to crawl every file, every path in your website. It is rare that you would actually want a webcrawler to do this. For instance, do you really want all your button images and background images indexed?

No comments:

Post a Comment

Contact Us


Email *

Message *