Go Back   Web Design & SEO Company > Knowledgebase

Knowledgebase Articles and information about running a website, cPanel and various hints and tips. Here you will find tutorials on php, MySql, .htaccess, cron, SEO, Search Engines, CHMOD, FTP, CSS, HTML and various other hints and tips on running and Administrating a website.

Reply
 
  #1  
Old 04-20-2007, 02:41 PM
Admin's Avatar
Administrator
 
Join Date: Jan 2007
Location: Taree
Posts: 613
Default Canonical URLs With 301 Redirects - Important SEO

Canonical URL's - Brief outline of .htaccess first

This is the special file that sets up the deal for you. It can contain all sorts of directives for the Apache server. If you’re not using an Apache-based server, you’ll have to read your server’s manual on how to do it.

Look in your root directory, the place where your homepage is, for this file (.htaccess). If it's not there don't fret, you can just create it afresh and it won't make any difference. When doing so, just make an empty text file in Notepad or whatever, and make sure you start the filename with a dot — it's vital. This means that it is no longer a text file — the file suffix will be .htaccess, so it doesn't have a filename of its own.

What the heck is a canonical URL?

I didn’t know what a canonical URL was at first either, so don’t worry if you don’t know.

Canonical essentially means “standard” or “authoritative”, so a canonical URL for search engine marketing purposes is the URL you want people to see. Depending on how your web site was programmed or how your tracking URLs are setup for marketing campaign, there may be more than one URL for a particular web page.

The problem most search engine marketers run into deals with domains. Sometimes if a domain is not setup properly, the domain URL (domain.com) and the www domain URL (www.domain.com) are considered individual web pages. Since both pages maybe indexed by Google - you could get hit for duplicate content and at the very least you would be splitting your link popularity.

The easiest way to protect your site is to redirect all forms of your domain to one “standard” URL - a canonical URL.

For example to force the use of www.tareeinternet.com instead of using tareeinternet.com, or http://tareeinternet.com I have these lines in my .htaccess file. (This is Apache specific, if you use IIS the lines should be the same using ISAPI filter)

Quote:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^tareeinternet\.com$ [NC]
RewriteRule ^(.*)$ http://www.tareeinternet.com/$1 [R=301,L]
The first line “RewriteEngine On” tells Apache to enable Mod_rewrite - the engine responsible for manipulating URLs.

The second line “RewriteCond %{HTTP_HOST} ^tareeinternet\.com$ [NC]” looks for when people access tareeinternet.com. The “[NC]” flag makes the test case-insensitive, so it catches URLs like TareeInterNet.com.

The third line “RewriteRule ^(.*)$ http://www.tareeinternet.com/$1 [R=301,L]” redirects tareeinternet.com to www.tareeinternet.com with a 301 redirect.

The problem lies in how Google views each of these urls. Even though DNS typically points the two urls to the same website, Google will consider the two urls as seperate sites without a 301.

This will "split" your pagerank, so if 50 people link to the http://domain.com version, and another 50 link to www.domain.com Google wont see your website as having 100 backlinks and awarding the value of 100 backlinks to the 1 URL. So you will most likely end up having a PR2 on each version, were you may of obtained a PR4 on the 1 link.

Happy rewriting
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

  #2  
Old 04-23-2007, 10:54 AM
Admin's Avatar
Administrator
 
Join Date: Jan 2007
Location: Taree
Posts: 613
Default

An Introduction to Rewriting

Readable URLs are nice. A well designed website will have a logical file system layout, with smart folder and file names, and as many implementation details left out as possible.

In the most well designed sites, readers can guess at filenames with a high level of success.

However, there are some cases when the best possible information design can’t stop your site’s URLs from being nigh-on impossible to use. For instance, you may be using a Content Management System that serves out URLs that look something like

http://www.example.com/viewcatalog.a...hats&prodID=53

This is a horrible URL, but it and its brethren are becoming increasingly prevalent in these days of dynamically-generated pages. There are a number of problems with an URL of this kind:
  • It exposes the underlying technology of the website (in this case ASP). This can give potential hackers clues as to what type of data they should send along with the query string to perform a ‘front-door’ attack on the site. Information like this shouldn’t be given away if you can help it.
    Even if you’re not overly concerned with the security of your site, the technology you’re using is at best irrelevant — and at worst a source of confusion — to your readers, so it should be hidden from them if possible.
    Also, if at some point in the future you decide to change the language that your site is based on (to php for instance); all your old URLs will stop working. This is a pretty serious problem, as anyone who has tackled a full-on site rewrite will attest.
  • The URL is littered with awkward punctuation, like the question mark and ampersand. Those & characters, in particular, are problematic because if another webmaster links to this page using that URL, the un-escaped ampersands will mess up their XHTML conformance.
  • Some search engines won’t index pages which they think are generated dynamically. They’ll see that question mark in the URL and just turn their asses around.
Luckily, using rewriting, we can clean up this URL to something far more manageable. For example, we could map it to

http://www.example.com/catalog/hats/53/

Much better. This URL is more logical, readable and memorable, and will be picked up by all search engines. The faux-directories are short and descriptive. Importantly, it looks more permanent.

To use mod_rewrite, you supply it with the link text you want the server to match, and the real URLs that these URLs will be redirected to. The URLs to be matched can be straight file addresses, which will match one file, or they can be regular expressions, which will match many files.

Basic Rewriting

Some servers will not have mod rewrite enabled by default. As long as the module is present in the installation, you can enable it simply by starting a .htaccess file with the command

RewriteEngine on

Put this .htaccess file in your root so that rewriting is enabled throughout your site. You only need to write this line once per .htaccess file.

Basic Redirects

We’ll start off with a straight redirect; as if you had moved a file to a new location and want all links to the old location to be forwarded to the new location. Though you shouldn’t really ever move a file once it has been placed on the web; at least when you simply have to, you can do your best to stop any old links from breaking.

Code:
RewriteEngine on
RewriteRule ^old\.html$ new.html
Though this is the simplest example possible, it may throw a few people off. The structure of the ‘old’ URL is the only difficult part in this RewriteRule. There are three special characters in there.
The caret, ^, signifies the start of an URL, under the current directory. This directory is whatever directory the .htaccess file is in. You’ll start almost all matches with a caret.

The dollar sign, $, signifies the end of the string to be matched. You should add this in to stop your rules matching the first part of longer URLs.

The period or dot before the file extension is a special character in regular expressions, and would mean something special if we didn’t escape it with the backslash, which tells Apache to treat it as a normal character.

So, this rule will make your server transparently redirect from old.html to the new.html page. Your reader will have no idea that it happened, and it’s pretty much instantaneous.

Forcing New RequestsSometimes you do want your readers to know a redirect has occurred, and can do this by forcing a new HTTP request for the new page. This will make the browser load up the new page as if it was the page originally requested, and the location bar will change to show the URL of the new page. All you need to do is turn on the [R] flag, by appending it to the rule:

Code:
RewriteRule ^old\.html$ new.html [R]
Using Regular Expressions

Now we get on to the really useful stuff. The power of mod_rewrite comes at the expense of complexity. If this is your first encounter with regular expressions, you may find them to be a tough nut to crack, but the options they afford you are well worth the slog. I’ll be providing plenty of examples to guide you through the basics here.

Using regular expressions you can have your rules matching a set of URLs at a time, and mass-redirect them to their actual pages. Take this rule;

Code:
RewriteRule ^products/([0-9][0-9])/$ /productinfo.php?prodID=$1
This will match any URLs that start with ‘products/’, followed by any two digits, followed by a forward slash. For example, this rule will match an URL like products/12/ or products/99/, and redirect it to the PHP page.
The parts in square brackets are called ranges. In this case we’re allowing anything in the range 0-9, which is any digit. Other ranges would be [A-Z], which is any uppercase letter; [a-z], any lowercase letter; and [A-Za-z], any letter in either case.

We have encased the regular expression part of the URL in parentheses, because we want to store whatever value was found here for later use. In this case we’re sending this value to a PHP page as an argument. Once we have a value in parentheses we can use it through what’s called a back-reference. Each of the parts you’ve placed in parentheses are given an index, starting with one. So, the first back-reference is $1, the third is $3 etc.

Thus, once the redirect is done, the page loaded in the readers’ browser will be something like productinfo.php?prodID=12 or something similar. Of course, we’re keeping this true URL secret from the reader, because it likely ain’t the prettiest thing they’ll see all day.

Multiple RedirectsIf your site visitor had entered something like products/12, the rule above won’t do a redirect, as the slash at the end is missing. To promote good URL writing, we’ll take care of this by doing a direct redirect to the same URL with the slash appended.

Code:
RewriteRule ^products/([0-9][0-9])$ /products/$1/ [R]
Multiple redirects in the same .htaccess file can be applied in sequence, which is what we’re doing here. This rule is added before the one we did above, like so:

Code:
RewriteRule ^products/([0-9][0-9])$ /products/$1/ [R]
RewriteRule ^products/([0-9][0-9])/$ /productinfo.php?prodID=$1
Thus, if the user types in the URL products/12, our first rule kicks in, rewriting the URL to include the trailing slash, and doing a new request for products/12/ so the user can see that we likes our trailing slashes around here. Then the second rule has something to match, and transparently redirects this URL to productinfo.php?prodID=12.

Match ModifiersYou can expand your regular expression patterns by adding some modifier characters, which allow you to match URLs with an indefinite number of characters. In our examples above, we were only allowing two numbers after products. This isn’t the most expandable solution, as if the shop ever grew beyond these initial confines of 99 products and created the URL productinfo.php?prodID=100, our rules would cease to match this URL.

So, instead of hard-coding a set number of digits to look for, we’ll work in some room to grow by allowing any number of characters to be entered. The rule below does just that:

Code:
RewriteRule ^products/([0-9]+)$ /products/$1/ [R]
Note the plus sign (+) that has snuck in there. This modifier changes whatever comes directly before it, by saying ‘one or more of the preceding character or range.’ In this case it means that the rule will match any URL that starts with products/ and ends with at least one digit. So this’ll match both products/1 and products/1000.

Other match modifiers that can be used in the same way are the asterisk, *, which means ‘zero or more of the preceding character or range’, and the question mark, ?, which means ‘zero or only one of the preceding character or range.’

Adding Guessable URLsUsing these simple commands you can set up a slew of ‘shortcut URLs’ that you think visitors will likely try to enter to get to pages they know exist on your site. For example, I’d imagine a lot of visitors try jumping straight into our stylesheets section by typing the URL http://www.tareeinternet.com/css/ We can catch these cases, and hopefully alert the reader to the correct address by updating their location bar once the redirect is done with these lines:

Code:
RewriteRule ^css(/)?$ /stylesheets/ [R]
The simple regular expression in this rule allows it to match the css URL with or without a trailing slash. The question mark means ‘zero or one of the preceding character or range’ — in other words either yourhtmlsource.com/css or yourhtmlsource.com/css/ will both be taken care of by this one rule.

This approach means less confusing 404 errors for your readers, and a site that seems to run a whole lot smoother all ’round.
Reply With Quote
  #3  
Old 03-16-2010, 03:21 AM
Senior Member
 
Join Date: Jun 2009
Posts: 222
Default

thanks for helpful posting its so helpful for me as SEO
Reply With Quote
Top SEO Tool
Harvester and Mass Blog Commenter
Blog Comment Software

Reply

Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Should I use short or long urls on Social sites jennypretty SEO 0 04-07-2013 10:33 PM
Important Things In SEO Palaimon SEO 17 02-16-2011 09:29 AM
Canonical Issues and its Types Danielnash SEO 2 12-30-2009 09:49 AM
Best SEO tips for Affiliate long URLs jennypretty SEO 3 08-28-2009 10:11 AM
What is Important for Google? seoma01 SEO 3 07-06-2009 06:05 AM
Please advice me about blog redirects albertseo SEO 2 03-24-2009 02:11 AM
rel=canonical" - New Google Attribute shennon SEO 7 03-05-2009 01:38 PM



Knowledgebase | SEO | Free Scripts | Free Wordpress Themes | Free Graphics

eval gzinflate base64 decode | SEO Addons


Forum time zone is GMT. Currently it's 03:09 PM.

SEO - Top



Web Design & SEO Forums