|
SEO Search Engine Optimization, this section lists articles and tutorials on Search Engine Optimization for various Search Engines including Google, Yahoo & MSN. We provide SEO hints, tips and other free goodies to help you optimize your site and to start ranking well in the Search Engines. |
![]() |
|
#1
| |||
| |||
![]()
Please can anyone explain me what is robots.txt file and its correct format?
|
![]() Blog Comment Software |
#2
| |||
| |||
![]()
Web robots – often referred to as crawlers, bots, or spiders – are software programs that constantly travel the web, indexing the information found on millions and millions of websites every single day. Some sites, however, don’t wish to be indexed in search engines or accessed by these Web Robots. Now that you know what a Web Robot is and what it does, it’s important you know what can be done to limit their access to your site if you so desire. There may be a number of reasons for wanting to prevent bot access to a website page or specific directory. The most common reasons are related to security, privacy and duplicate content. The Robots Exclusion Protocol, more commonly referred to as a /robots.txt file, provides webmasters with the ability to provide instructions on indexing the site to bots. The file, which must reside in the domain’s root directory, serves to limit the bots’ access to files within that domain’s root directory. There are often a large number of pages that make up an entire site, but many of those pages – like registration, login, 404 error, privacy policy and order confirmation pages – should not be indexed by search engines. The /robots.txt file also comes in particularly handy for webmasters with a wide network of sites with identical privacy policies, terms and conditions or e-commerce sites that have checkout pages, shopping carts, etc. Addressing Duplicate Content with /robots.txt The /robots.txt file can also help to eliminate duplicate content issues that arise with blogging software, such as WordPress. With WordPress – and all blogging software, for that matter – content from blog posts is published on the post URL itself, but copies of that content are also published on category pages, as well as tag and author archives. This inadvertently creates several pages of duplicate content. Since duplicate content can have a negative impact on a site’s ranking in the organic search results, the /robots.txt file can help to reduce the potential for duplicate content that can adversely affect the site’s search marketing strategy. Understanding How To Use /robots.txt In order to function properly, the /robots.txt file should be accessible at http://www.domain.com/robots.txt and reside in the domain’s root directory. The file itself should be created as a plain text document. Do NOT use Microsoft Word or another word processing program – the standard Notepad program that is installed with Windows or SimpleText/TextEdit with the Mac OS work best. The file name must be robots.txt and uploaded directly to the domain’s root directory. The commands within the file itself can be as simple or complex as your needs demand. The standard, generic /robots.txt file – one that does not limit access to any of the information in your domain’s root directory – would be formatted like this: User-agent: * Disallow: In order to block bot access to the domain’s root directory completely requires adding only one character to the standard or generic /robots.txt file and would look like this: User-agent: * Disallow: / What if you want to limit bot access only to certain subdirectories or specific pages of the site? Not a problem. You would simply add each individual subdirectory or URL to the /robots.txt file as follows: User-agent: * Disallow: /checkout.asp Disallow: /add_cart.asp Disallow: /view_cart.asp Disallow: /error.asp Disallow: /shipquote.asp The Robots.txt File Is Not Fool Proof While the /robots.txt file does a good job of blocking a bot’s access to the domain’s root directory, it isn’t fool proof. Each individual page you do not want bots to index should also incorporate a properly formatted robots META tag. The standard robots META tag is configured like this: <meta name=”robots” content=”index,follow” /> To help to prevent the bots from accessing individual URLs, the robots META tag in the header of the page should look like this: <meta name=”robots” content=”noindex,nofollow” /> or <meta name=”robots” content=”noindex,follow” /> The Bottom Line A /robots.txt is a very useful tool and, unfortunately, an often overlooked and neglected aspect of web development. Now that you have a better understanding of what it is, what it does and how to use it, take some time to consider how your site may benefit from having a properly configured /robots.txt file. In the meantime, start checking out the /robots.txt files of the sites you visit to familiarize yourself with different configurations and uses for it. Original Post http://www.dirjournal.com/articles/robots-txt-101/ |
#3
| |||
| |||
![]()
Thanks for replying to my post
|
#4
| |||
| |||
![]()
You can exclude allot of bad bots from robots txt. But you can also use htaccess. ![]() |
#5
| |||
| |||
![]()
Thanks to all for replying to my post
|
#6
| |||
| |||
![]()
Robots.txt file controls crawler behavior. I am copy - pasting some lines from this article I found on Redalkemi site. It hope it is useful to you Here are some tips on how to use robots.txt file - 1. The robots.txt file is always named in all lowercase (e.g. Robots.txt or robots.Txt is incorrect) 2. The robots.txt file is an exclusion file meant for search engine robot reference and not obligatory for a website to function. An empty or absent file simply means that all robots are welcome to index any part of the website. If you would like to read this full article, you can on redalkemi dot com under article section (Search Engine Optimization article) |
![]() Blog Comment Software |
![]() |
Tools | |
Display Modes | |
![]() | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
what is the Robots.txt? | Danielnash | SEO | 2 | 11-24-2009 06:49 AM |
robots.txt | rubsync | SEO | 3 | 10-05-2009 05:59 AM |
how to set a robo.txt file in our source code?? | davin.master | SEO | 0 | 06-17-2009 12:26 PM |
Blocking Bad Bots with Robots.txt | Admin | Knowledgebase | 1 | 04-21-2008 02:17 AM |
Changing File & Folder Permissions CHMOD | Admin | Knowledgebase | 1 | 04-15-2007 08:18 AM |
Robots txt file | Admin | Knowledgebase | 0 | 04-04-2007 12:07 PM |