Go Back   Web Design & SEO Company > Knowledgebase

Knowledgebase Articles and information about running a website, cPanel and various hints and tips. Here you will find tutorials on php, MySql, .htaccess, cron, SEO, Search Engines, CHMOD, FTP, CSS, HTML and various other hints and tips on running and Administrating a website.

Reply
 
  #1  
Old 04-04-2007, 12:07 PM
Admin's Avatar
Administrator
 
Join Date: Jan 2007
Location: Taree
Posts: 607
Default Robots txt file

Search engine spiders and similar robots will look for a robots.txt file, located in your main web directory. This is a plain text file. Create or modify it with a text editor and be sure to upload (FTP) it in ASCII mode.


This file is used to exclude robots from sections of your web site, so they won't read files in those areas.
1. What are these robots?
These are mostly automated software which fetches content on many web sites for a variety of purposes.
Search engines often call these spiders and send them out to look for pages to include in their search results.
Some spammers also use this technology to harvest email addresses to send their junk mail to. Other uses include bots looking for illegal files or content.
2. How do I create a robots.txt file?
The syntax is very limited and easy to understand. The first part specifies the robot we are referring to.
User-agent: BotName
Replace BotName with the robot name in question. To address all of them, simply use an asterisk.
User-agent: *
The second part tells the robot in question not to enter certain parts of your web site.
Disallow: /cgi-bin/
In this example, any path on our site starting with the string /cgi-bin/ is declared off limits. Multiple paths can be excluded per robot by using several Disallow lines.
User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /private
This robots.txt file would apply to all bots and instruct them to stay out of directories /cgi-bin/ and /temp/.
It also tells them any path/URL on your site starting with /private (files and directories) is off limits.
To declare your entire website off limits to BotName, use the example shown below.
User-agent: BotName
Disallow: /
To have a generic robots.txt file which welcomes every robot and does not restrict them, use this sample.
User-agent: *
Disallow: This beginner's tutorial includes a list of common robot names to get you started. Many others exist.


Some bots will ignore robots.txt files as they don't care if you want them on your web site or not.
These can be blocked by using a .htaccess file instead.
1. Block robots via .htaccess
We can't block by robot name here, we block them by matching the beginning of their User-Agent string.
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

This example bans a list of spambots.
To block another robot, add a line for it near the top.
SetEnvIfNoCase User-Agent "^User-Agent" bad_bot

Replace User-Agent with the User-Agent string for this robot, as found in log files. Here's a sample log entry.
xyz.net - - [07/Mar/2003:11:28:35] "GET / HTTP/1.0" 403 - "-" "Teleport 1.28"
Here, the User-Agent is Teleport 1.28. The ^ character in the SetEnvIfNoCase lines tells our .htaccess file to block anything starting with the string we provide. Any User-Agent starting directly with Teleport would be blocked, regardless of version number or added text.
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

Reply

Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Blocking Bad Bots with Robots.txt Admin Knowledgebase 1 04-21-2008 02:17 AM
Changing File & Folder Permissions CHMOD Admin Knowledgebase 1 04-15-2007 08:18 AM



Knowledgebase | SEO | Free Scripts | Free Wordpress Themes | Free Graphics

Micro Niche Finder Review - eval gzinflate base64 decode


Forum time zone is GMT. Currently it's 08:15 AM.


Forum SEO | Chrome Plugins

Web Design & SEO Forums