Go Back   Web Design & SEO Company > Knowledgebase

Knowledgebase Articles and information about running a website, cPanel and various hints and tips. Here you will find tutorials on php, MySql, .htaccess, cron, SEO, Search Engines, CHMOD, FTP, CSS, HTML and various other hints and tips on running and Administrating a website.

Reply
 
  #1  
Old 06-18-2007, 12:31 PM
Admin's Avatar
Administrator
 
Join Date: Jan 2007
Location: Taree
Posts: 607
Default Blocking Bad Bots with Robots.txt

Blocking Bad Bots with Robots.txt

Bad bots are typically scripts or "Robots" that are used to mine data from your website, or even worst capture your entire site only for the offender to put a complete duplicate online. This is usually done so they can benefit from Adsense earnings, but can also be detrimental to your search engine rankings due to duplicate content.

Below i have provided a Robots.txt list of all the common Web Harvesters, or Website Ripper scripts. This is not fool proof as some allow their "User-Agent" to be changed or spoofed, however it will make things a bit more difficult for them to rip. It's similar to locking up your house... People can still break in, however it's a deterent.

All you have to do is copy and paste this list in to Notepad, and save it as robots.txt Then just upload it to your sites root or public_html folder. To ensure it is working properly visit http://www.yoursite.com/robots.txt

It should then be visible.

Quote:
User-agent: aipbot
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: asterias
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: becomebot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: BotRightHere
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Fasterfox
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: hloader
Disallow: /

User-agent: httplib
Disallow: /

User-agent: HTTrack 3.0
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: IconSurf
Disallow: /
Disallow: /favicon.ico

User-agent: InfoNaviRobot
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: larbin
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: moget
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Openfind data gatherer
Disallow: /

User-agent: Oracle Ultra Search
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RMA
Disallow: /

User-agent: searchpreview
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: spanner
Disallow: /

User-agent: SurveyBot
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: TightTwatBot
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: turingos
Disallow: /

User-agent: TurnitinBot
Disallow: /

User-agent: TurnitinBot/1.5
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: VCI
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: WebCapture 2.0
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: WebCopier v.2.2
Disallow: /

User-agent: WebCopier v3.2a
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebZIP/4.21
Disallow: /

User-agent: WebZIP/5.0
Disallow: /

User-agent: Wget
Disallow: /

User-agent: wget
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Zeus Link Scout
Disallow: /
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

  #2  
Old 04-21-2008, 02:17 AM
allaboutdatingsites
Guest
 
Posts: n/a
Default Robots.txt does not prevent bad robots from access

Robots.txt only works on robots that "behave" and typically bad robots do not.

You need to use .htaccess to block these bad guys or somehow programmatically trap and dispatch them, like a honeypot.

What you have presented here is a bit misleading. sorry.

allaboutdatingsites
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

Reply

Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Robots txt file Admin Knowledgebase 0 04-04-2007 12:07 PM



Knowledgebase | SEO | Free Scripts | Free Wordpress Themes | Free Graphics

Micro Niche Finder Review - eval gzinflate base64 decode


Forum time zone is GMT. Currently it's 04:21 AM.


Forum SEO | Chrome Plugins

Web Design & SEO Forums