Monday, May 12, 2008

A Comprehensive Guide to MOSS URL Rewriting

Challenge
Here is the scenario. MOSS is a great platform to work on but there are some things that are not ideal from a customer facing .com site. I will compare and contrast 2 features in CMS with those in MOSS.


1) The first example would be the term 'Pages' in the URL. Since pages in a publishing site are stored in the 'Pages' library by default, that term appears as part of the external URL. To some eBusiness users, that is not acceptable because 1) the term doesnt mean anything relevant to a spider and 2) it probably hinders the URL rank.


2) CMS 2002 allowed subdomain mapping to top level channels. That means you could have as many subdomains as you wanted on one IIS website which is a best practice for both CMS and MOSS from a performance standpoint (less websites on server = GOOD). However in MOSS 2007 there is no such direct mapping. So the options are a) have separate MOSS web apps for all the subdomains (OOPS) or b) do some fancy URL rewriting. Needless to say, we went with the URL rewriting because we had over 25 subdomains that could increase with time and we did not want to go with that many MOSS web apps for the performance implications that might have.


So as part of the CMS to MOSS migration, we brought over all the subdomains (~20) into one site collection. The reason for this is that the subdomains are very closely tied together and the amount of data was not too large (15 GB). From a maintenance perspective, we can separate these subdomains into different site collections in the future should we decide to. We are also not using the variations feature in MOSS because we dont have exact content mirrors in all of our subdomains. We are also using content deployment to push content from the authoring to the production farm.


So we had 2 significant challenges to overcome. One was to map many subdomains to second level sites in a site collection. The second one was to allow for .htm extensions and get rid of 'Pages' in the URL - for reasons explained below.


Solution
At this stage, we could go with ISA server/firewall mechanism to meet our URL rewriting needs. The one problem was that we didnt have enough time to test and implement an ISA server solution, not considering the cost of the ISA server itself. The other solution is to use a URL rewriting mechanism along the lines of Apache mod-rewrite for IIS to translate our URLs on the fly. We went with one such third party IIS rewrite solution.


The IIS rewrite rules were setup on a honeypot empty IIS website which contains host header entries for all the sudomains we serve - the IIS redirect acts on them and instead does a reverse proxy to the real MOSS Web application and displays the data. There are additional rules to map the .js, .css, and other files that are loaded on every request. For eg, a request to www.company.com/xyz.htm gets translated internally to ext.company.com/www.company_com/xyz.htm and the content is served back without the link changing in the address bar of the browser - which is a function of the reverse proxy.


The other need was that all our pages were surfaced as .htm in CMS and moving them to .aspx in MOSS would break hundreds of thousands of links, along with including 'Pages' as part of the URL. The EBusiness team also sincerely believed that having 'Pages' in the URL does nothing to help our SEO and probably hinders it. So this was deemed as a showstopper and we had to devise a solution for this challenge. The solution was to use the friendly URL feature offered by 'Rapid For SharePoint' - which is basically a HTTP module. This module takes out the pages/pagename.aspx and instead replaces it with pagename.htm (we can configure the extensions). So a request for auto.company.com/support/pages/default.aspx would instead be translated to auto.company.com/support/default.htm, which is acceptable to the
eBusiness team and does not break the links. This module also changes links within the page content to point to filename.htm instead of pages/filename.aspx. We also used their XHTML filter which basically does a regular expression on a pattern and replace it with other text to change relative links to absolute ones in the html fields.


In MOSS, all requests to a site get translated to /pages/default.aspx as that is the default page in the pages library. For example, a request for auto.company.com/support will get translated to auto.company.com/support/pages/default.aspx. MOSS returns a 302 redirect along with the new link to the browser which stops the reverse proxy mechanism and instead changes the url in the browser address. For this I put in a rule in our IIS redirect to translate all requests to sites and subsites by adding the default page name to the request before the request ever reaches the MOSS Web application. This obviates the 302 redirect problem. So a request to http://www.company.com/ is translated to www.company.com/default.htm and is then proxied to the MOSS Website.


There was also a requirement to serve MOSS page links containing .jsp extensions since CMS didn't really care about the extensions (you could request a page with .htm, .jsp or no extensions and it would be served) so there were some .jsp links that had to be served. I achieved this using an IIS mod rewrite rule.


Considerations
1. Site page redirects in MOSS all respond with a HTTP 302 which will throw the reverse proxy mechanism off. Hence all the redirect pages will need to specify the final external (SEO friendly) link.

2. All sites need an index page called index.htm, because all requests to sites will be translated to site/index.htm to avoid MOSS sending back a HTTP 302 (the name index is arbitrary, you could use default.htm or any other name). The index page can then be a redirect page to the destined page (with a fully qualified external link redirect link) if need be.

3. XHTML filter needs (working for changing all relative links to complete. For eg. Changing "/auto.company.com to "http://auto.company.com/). This was added for all subdomains.


Long Term Strategy
The long term strategy should be to use ISA server to do the address translation to reduce the load on the MOSS Web Front End servers.

11 comments:

bernard n. shull said...

hi mate, this is the canadin pharmacy you asked me about: the link

Jamie said...

I was wondering if you ever figured out how to get ISA 2006 to do this? We moved from CMS to MOSS and need to change our urls....

For example http://www.temp.com/Pages/HomeNF.aspx needs to be displayed in the browsers ast www.temp.com/HomeNF.aspx or www.temp.com. Any assistance you could provide will be very helpful. Thanks.

Anonymous said...

kral oyun
oyunlar thanks

Richard said...

Hey, if either you or Jamie found out how to avoid the /Pages/default.aspx I'd love to know.

Anglina said...

Thanks you Sir.

msqr said...

hi Faraz,

We are exactly running into similar scenario where we have multiple subdomains(15) and we do not want to create web application for each of them.

Do you know the status of "Rapid for sharepoint" httpModule, it seems to have disappeared.

msqr said...

Hi Faraz,

Do you know the status of "Rapid for sharepoint" http module or can u suggest similar http module?

Faraz said...

This is news to me, no I do not know what happened to the module!

Anonymous said...

The "Pages" thing in the URL definitely looks weird and I agree with all those custom requirements of not having this in the site url.
But one thing I have personally experienced that this is NOT going to hurt you, when it comes to the Search Engine Optimization.
It apparently seems that search engines (Google at least) have been made intelligent enough to detect and understand this.
In fact, all the work we did around SEO has given us a better and a jump in the rankings.
So in short, do not worry about url re-write if there is no no concern other than SEO.
HTH,
Zullu.

Anonymous said...

miley cyrus nude [url=http://www.ipetitions.com/petition/mileycyrus]miley cyrus nude[/url] paris hilton nude [url=http://www.ipetitions.com/petition/parishilt]paris hilton nude[/url] kim kardashian nude [url=http://www.ipetitions.com/petition/kimkardashian45]kim kardashian nude[/url] kim kardashian nude [url=http://www.ipetitions.com/petition/celebst]kim kardashian nude[/url]

Hamilton Taxi said...

Thanks for sharing your good points about this.