Saturday, March 22, 2008

MOSS SP1 and Daylight Savings Time Patch Experiences

Recently during our CMS to MOSS upgrade we were having issues with the timer jobs not running on time because of the DST problems (we did not have the DST hotfixes or SP1 applied yet). Some of the problems it caused were:

Content deployment jobs (even on the same server) were timing out - basically they would wait for an hour to run and then time out anyway.

If you go ahead and create new web applications, it takes about an hour to actually provision these Web apps.

Other timer jobs are affected too.

There is a way to fool the timer job by changing the time on your server back one hour so it would actually think that it was time to run. However, the timer job would run in some cases and still not run in others (such as content deployment). We found a way to actually force these timer jobs to run by using the following command.

"C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN\stsadm.exe" -o execAdmSvcJobs

This command needs to be run on all Web servers in the farm.

We could have now installed the DST patch or go down the SP1 route. We did not go the (WSS and MOSS) SP1 route because that hosed our test environment. Also WSS SP1 threw an error and did not install successfully on our staging environment, though the MOSS SP1 installed successfully in staging. This constituted a significant risk in our mind so we decided not to move forward with the SP1 install on production (Needless to say, we had to rebuild our stage and test farms because you cannot just roll back from the SP1 upgrade).

So we decided to move forward with just the patch (for now) for fixing the timer jobs problem. This worked fine for our test and staging environments, which had minimal data because we had to rebuild them and did not have time to reattach all the content databases from production yet. On production however, the patch threw an error and about 5% of the content was missing - though the Web apps loaded fine. Interestingly enough, we migrated from SharePoint 2003 to MOSS 2007 last year and all the migrated content appeared to be there. The new sites that we had created on MOSS were missing however.

So we decided to attach the backed up databases from the night before thinking that MOSS probably stored all the configuration for the patch in the DB - and going back one night would bring us back to the night before we applied the patch. Unfortunately upon attaching those databases the apps did not work and we got this error in the event log,

The schema version ( of the database SharePoint_AdminContent_711c9d8b-17ed-404c-987a-708e0e059b12 on DBSERVERNAME is not consistent with the expected database schema version ( on WEBSERVERNAME. Connections to this database from this server have been blocked to avoid data loss. Upgrade the web front end or the content database to ensure that these versions match.

At this point we had two options, since the server appeared to be hosed - we could rebuild the production server or restore the image backup of the entire server (bare metal restore) as defined in the SLA with our backup provider. Both alternatives were not rosy so we decided to tinker a little bit. So we looked around and found a blog that talked about this problem. Thanks to Adlai Maschiach's blog that helped us get past this glitch.

TODO: Now ee will go back to the drawing board and find a way to upgrade the production farm again. I will post those experiences here shortly.