Monday, May 12, 2008

A Comprehensive Guide to MOSS URL Rewriting

Here is the scenario. MOSS is a great platform to work on but there are some things that are not ideal from a customer facing .com site. I will compare and contrast 2 features in CMS with those in MOSS.

1) The first example would be the term 'Pages' in the URL. Since pages in a publishing site are stored in the 'Pages' library by default, that term appears as part of the external URL. To some eBusiness users, that is not acceptable because 1) the term doesnt mean anything relevant to a spider and 2) it probably hinders the URL rank.

2) CMS 2002 allowed subdomain mapping to top level channels. That means you could have as many subdomains as you wanted on one IIS website which is a best practice for both CMS and MOSS from a performance standpoint (less websites on server = GOOD). However in MOSS 2007 there is no such direct mapping. So the options are a) have separate MOSS web apps for all the subdomains (OOPS) or b) do some fancy URL rewriting. Needless to say, we went with the URL rewriting because we had over 25 subdomains that could increase with time and we did not want to go with that many MOSS web apps for the performance implications that might have.

So as part of the CMS to MOSS migration, we brought over all the subdomains (~20) into one site collection. The reason for this is that the subdomains are very closely tied together and the amount of data was not too large (15 GB). From a maintenance perspective, we can separate these subdomains into different site collections in the future should we decide to. We are also not using the variations feature in MOSS because we dont have exact content mirrors in all of our subdomains. We are also using content deployment to push content from the authoring to the production farm.

So we had 2 significant challenges to overcome. One was to map many subdomains to second level sites in a site collection. The second one was to allow for .htm extensions and get rid of 'Pages' in the URL - for reasons explained below.

At this stage, we could go with ISA server/firewall mechanism to meet our URL rewriting needs. The one problem was that we didnt have enough time to test and implement an ISA server solution, not considering the cost of the ISA server itself. The other solution is to use a URL rewriting mechanism along the lines of Apache mod-rewrite for IIS to translate our URLs on the fly. We went with one such third party IIS rewrite solution.

The IIS rewrite rules were setup on a honeypot empty IIS website which contains host header entries for all the sudomains we serve - the IIS redirect acts on them and instead does a reverse proxy to the real MOSS Web application and displays the data. There are additional rules to map the .js, .css, and other files that are loaded on every request. For eg, a request to gets translated internally to and the content is served back without the link changing in the address bar of the browser - which is a function of the reverse proxy.

The other need was that all our pages were surfaced as .htm in CMS and moving them to .aspx in MOSS would break hundreds of thousands of links, along with including 'Pages' as part of the URL. The EBusiness team also sincerely believed that having 'Pages' in the URL does nothing to help our SEO and probably hinders it. So this was deemed as a showstopper and we had to devise a solution for this challenge. The solution was to use the friendly URL feature offered by 'Rapid For SharePoint' - which is basically a HTTP module. This module takes out the pages/pagename.aspx and instead replaces it with pagename.htm (we can configure the extensions). So a request for would instead be translated to, which is acceptable to the
eBusiness team and does not break the links. This module also changes links within the page content to point to filename.htm instead of pages/filename.aspx. We also used their XHTML filter which basically does a regular expression on a pattern and replace it with other text to change relative links to absolute ones in the html fields.

In MOSS, all requests to a site get translated to /pages/default.aspx as that is the default page in the pages library. For example, a request for will get translated to MOSS returns a 302 redirect along with the new link to the browser which stops the reverse proxy mechanism and instead changes the url in the browser address. For this I put in a rule in our IIS redirect to translate all requests to sites and subsites by adding the default page name to the request before the request ever reaches the MOSS Web application. This obviates the 302 redirect problem. So a request to is translated to and is then proxied to the MOSS Website.

There was also a requirement to serve MOSS page links containing .jsp extensions since CMS didn't really care about the extensions (you could request a page with .htm, .jsp or no extensions and it would be served) so there were some .jsp links that had to be served. I achieved this using an IIS mod rewrite rule.

1. Site page redirects in MOSS all respond with a HTTP 302 which will throw the reverse proxy mechanism off. Hence all the redirect pages will need to specify the final external (SEO friendly) link.

2. All sites need an index page called index.htm, because all requests to sites will be translated to site/index.htm to avoid MOSS sending back a HTTP 302 (the name index is arbitrary, you could use default.htm or any other name). The index page can then be a redirect page to the destined page (with a fully qualified external link redirect link) if need be.

3. XHTML filter needs (working for changing all relative links to complete. For eg. Changing "/ to " This was added for all subdomains.

Long Term Strategy
The long term strategy should be to use ISA server to do the address translation to reduce the load on the MOSS Web Front End servers.