Go Back   Web Design & SEO Company > SEO

SEO Search Engine Optimization, this section lists articles and tutorials on Search Engine Optimization for various Search Engines including Google, Yahoo & MSN. We provide SEO hints, tips and other free goodies to help you optimize your site and to start ranking well in the Search Engines.

Reply
 
  #1  
Old 09-22-2009, 08:25 AM
Junior Member
 
Join Date: Sep 2009
Posts: 6
Default What is the robots.txt file?

Please can anyone explain me what is robots.txt file and its correct format?
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

  #2  
Old 09-23-2009, 05:30 AM
Junior Member
 
Join Date: Oct 2008
Posts: 4
Lightbulb Robots.txt

Web robots – often referred to as crawlers, bots, or spiders – are software programs that constantly travel the web, indexing the information found on millions and millions of websites every single day. Some sites, however, don’t wish to be indexed in search engines or accessed by these Web Robots. Now that you know what a Web Robot is and what it does, it’s important you know what can be done to limit their access to your site if you so desire. There may be a number of reasons for wanting to prevent bot access to a website page or specific directory. The most common reasons are related to security, privacy and duplicate content.
The Robots Exclusion Protocol, more commonly referred to as a /robots.txt file, provides webmasters with the ability to provide instructions on indexing the site to bots. The file, which must reside in the domain’s root directory, serves to limit the bots’ access to files within that domain’s root directory. There are often a large number of pages that make up an entire site, but many of those pages – like registration, login, 404 error, privacy policy and order confirmation pages – should not be indexed by search engines. The /robots.txt file also comes in particularly handy for webmasters with a wide network of sites with identical privacy policies, terms and conditions or e-commerce sites that have checkout pages, shopping carts, etc.
Addressing Duplicate Content with /robots.txt

The /robots.txt file can also help to eliminate duplicate content issues that arise with blogging software, such as WordPress. With WordPress – and all blogging software, for that matter – content from blog posts is published on the post URL itself, but copies of that content are also published on category pages, as well as tag and author archives. This inadvertently creates several pages of duplicate content. Since duplicate content can have a negative impact on a site’s ranking in the organic search results, the /robots.txt file can help to reduce the potential for duplicate content that can adversely affect the site’s search marketing strategy.
Understanding How To Use /robots.txt

In order to function properly, the /robots.txt file should be accessible at http://www.domain.com/robots.txt and reside in the domain’s root directory. The file itself should be created as a plain text document. Do NOT use Microsoft Word or another word processing program – the standard Notepad program that is installed with Windows or SimpleText/TextEdit with the Mac OS work best. The file name must be robots.txt and uploaded directly to the domain’s root directory. The commands within the file itself can be as simple or complex as your needs demand.
The standard, generic /robots.txt file – one that does not limit access to any of the information in your domain’s root directory – would be formatted like this:
User-agent: *
Disallow:
In order to block bot access to the domain’s root directory completely requires adding only one character to the standard or generic /robots.txt file and would look like this:
User-agent: *
Disallow: /
What if you want to limit bot access only to certain subdirectories or specific pages of the site? Not a problem. You would simply add each individual subdirectory or URL to the /robots.txt file as follows:
User-agent: *
Disallow: /checkout.asp
Disallow: /add_cart.asp
Disallow: /view_cart.asp
Disallow: /error.asp
Disallow: /shipquote.asp
The Robots.txt File Is Not Fool Proof

While the /robots.txt file does a good job of blocking a bot’s access to the domain’s root directory, it isn’t fool proof. Each individual page you do not want bots to index should also incorporate a properly formatted robots META tag. The standard robots META tag is configured like this:
<meta name=”robots” content=”index,follow” />
To help to prevent the bots from accessing individual URLs, the robots META tag in the header of the page should look like this:
<meta name=”robots” content=”noindex,nofollow” />
or
<meta name=”robots” content=”noindex,follow” />
The Bottom Line

A /robots.txt is a very useful tool and, unfortunately, an often overlooked and neglected aspect of web development. Now that you have a better understanding of what it is, what it does and how to use it, take some time to consider how your site may benefit from having a properly configured /robots.txt file. In the meantime, start checking out the /robots.txt files of the sites you visit to familiarize yourself with different configurations and uses for it.

Original Post
http://www.dirjournal.com/articles/robots-txt-101/
__________________
SEO
website design
Reply With Quote
  #3  
Old 09-28-2009, 07:03 AM
Junior Member
 
Join Date: Sep 2009
Posts: 6
Default What is the robots.txt file?

Thanks for replying to my post
Reply With Quote
  #4  
Old 09-28-2009, 06:51 PM
Junior Member
 
Join Date: Sep 2009
Posts: 5
Default

You can exclude allot of bad bots from robots txt.

But you can also use htaccess.

Reply With Quote
  #5  
Old 10-12-2009, 04:46 AM
Junior Member
 
Join Date: Sep 2009
Posts: 6
Default

Thanks to all for replying to my post
Reply With Quote
  #6  
Old 10-12-2009, 04:56 AM
Junior Member
 
Join Date: Oct 2009
Posts: 6
Default

Robots.txt file controls crawler behavior. I am copy - pasting some lines from this article I found on Redalkemi site. It hope it is useful to you

Here are some tips on how to use robots.txt file -

1. The robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is incorrect)
2. The robots.txt file is an exclusion file meant for search engine robot reference and not obligatory for a website to function. An empty or absent file simply means that all robots are welcome to index any part of the website.

If you would like to read this full article, you can on
redalkemi dot com
under article section (Search Engine Optimization article)
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

Reply

Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
what is the Robots.txt? Danielnash SEO 2 11-24-2009 06:49 AM
robots.txt rubsync SEO 3 10-05-2009 05:59 AM
how to set a robo.txt file in our source code?? davin.master SEO 0 06-17-2009 12:26 PM
Blocking Bad Bots with Robots.txt Admin Knowledgebase 1 04-21-2008 02:17 AM
Changing File & Folder Permissions CHMOD Admin Knowledgebase 1 04-15-2007 08:18 AM
Robots txt file Admin Knowledgebase 0 04-04-2007 12:07 PM



Knowledgebase | SEO | Free Scripts | Free Wordpress Themes | Free Graphics

eval gzinflate base64 decode | SEO Addons


Forum time zone is GMT. Currently it's 10:24 PM.

SEO - Top



Web Design & SEO Forums