Sunday, September 28, 2008

Creating SharePoint 2007 Search Dashboard


A few months ago, we were given the task of determining the success our users were having when trying to find documents and web pages through our intranet implementation of Microsoft Office SharePoint Services (MOSS) 2007.  We first looked at built-in solutions offered by MOSS; but, all of their analytics followed the ‘what-you-see-is-what-you-get’ paradigm, and none gave a straight-forward answer to the questions we needed answers to.  This led us to pursue a custom analytics solution, and after much data probing and testing we ended up leveraging SQL Server Analysis Services (SSAS), ProClarity and ProClarity Dashboard to achieve our goal.



Step 1: Finding the Data 

MOSS stores data all over the place in a variety of different formats, so sorting out where the information is that we want to use can take quite a bit of time.  Luckily our solution requires no file-parsing and no 3rd party applications; the entire search data is stored in two separate MOSS databases.  The databases should be your shared service provider (SSP) content database and your MOSS content database.  Our setup used the following two databases, as pictured below:

The database names might differ, but you’re looking for the tables highlighted above. Once you’ve found those, you’re set.

Step 2: Establishing the Data Source View

How MOSS actually uses these tables in any manner other than how I did is a mystery to me, (although some of you more fluent in database design might understand them) but after a couple days of table analysis, I determined how the tables are linked with one another well enough to retrieve the information I desired.  All of the search information is inside the tables marked with the prefix ‘MSSQLog,’ so if you wish to examine the data on your own, that’s where you’ll find it.  Creating an OLAP cube for use in ProClarity is a powerful analysis technique; so, I structured the data into a typical data warehouse fact-dimension relationship, pictured below:

The table joins above are pretty straightforward (id to id) except for the join between the contextual scope table and the MSSQLog fact table.  This join is more complicated is due to the fact that we’re using the Webs table, and by now you might be wondering why we’re even using the Webs table in the first place.


The Webs table provides a listing of every site in your MOSS site hierarchy, providing for each site: a reference to its parent site, the site name, and the URL where it is located on your server.  This is extremely useful because all of your search statistics for each site would be consolidated into one all-encompassing ‘main site’, leaving out individual site statistics.  If you are looking for that overall consolidation, OLAP cubes allow you to aggregate all the sites together anyway. So really we’re just gaining extra functionality by adding this table in.


The join to the fact table isn’t exactly straightforward because the Webs table actually resides in a different database (as you should know from step 1).  Therefore, it’s not going to have any key columns or really anything in common with the fact table except for one column, URL.  The URL listed in the fact table is the URL where the search was performed (not what link the person clicked on, although that information is stored in another table), and since the URL listed in the Webs table is the URL of the site, the two should match up.  A little bit of string parsing needed to be done on the URL field in the fact table as it had the “http://” prefix (ex. where as the URL field in the Webs table was just the folder hierarchy (ex. engineering/softwareDev/).


Step 3: Building the Cube


With the data source view created, the next step is to create the OLAP cube to browse data quickly.  It’s a fairly basic cube. The only thing I really changed from the default options was that I added in a pre-built time dimension. I thought it would make sense to see data even for days when there weren’t any searches (otherwise if there weren’t any searches for a weekend, for example, the graphical representation of the data would just skip over it and not show zero for the number of searches for those days).  Also, in the Site Dimension, I had to add a parent-child recursive hierarchy built on the ‘Parent Web Id’ attribute (as pictured below).  You’ll need to add that in to build your site hierarchy, otherwise the dimension will just be a flat list of every site in your MOSS implementation.  Other than those two minor changes here are the measures, attributes, and dimensions I ended up with in my cube (your implementation may vary).

To give a few examples of what we can do with all this data:

  • The number of queries that have occurred
  • What was queried
  • What site the search happened at
  • How long it took before the user clicked on a result
  • Whether it was a best bet that was clicked
  • The average time to click for a given site
  • The number of results the search returned
  • The time frame on a given search. 


Now all of this data is available in the SSAS cube browser, but unless your end user is familiar with how pivot tables and OLAP cubes work, you’ll probably want to wrap it in a more visually friendly interface, which brings us to ProClarity.


Step 4: ProClarity Integration


The next step is to publish your cube to your SSAS server and find it in ProClarity Professional.  Now that we have all this search information, how do we answer that initial question, search success?  The actual term, ‘search success’ is kind of up to the interpretation of whomever is designing the tool, but I decided that in order to show search success you’d need more than one graph.  In fact, I ended up designing about 18 graphs in ProClarity, although not all were used:

You can see that you can get some pretty useful information, such as top 10 queries without best bets. Perhaps even more useful might be the top 10 queries with worst results that don’t have best bets.

After you’re done playing around with your graphs you could stop there and just point your users to the ProClarity graphs (presumably using ProClarity Standard), but for this solution we wanted to surface the graphs through MOSS.  We also wanted multiple graphs to show up on a single page as viewing 18 graphs individually might become tedious, so ProClarity Dashboard became the platform to view all this data.


Step 5: ProClarity Dashboard and MOSS Integration


Creating a dashboard is pretty straightforward, and it’s very much up to you how you want to organize your information.  For our solution, we organized the dashboard into four separate tabs: Site Search Analysis (for each individual site), Search Analysis (general statistics), Best Bet Analysis (for improving search efficiency), and Summary (general consolidation of the other three tabs).  You can see the end result looks very visually appealing:

Dashboard makes the search analytics data easy to understand and very easy to use

The last step is to integrate this into MOSS.  Because dashboard is just a simple webpage, we can use a page viewer webpart, and just refer to the dashboard URL.  The end result will end up looking similar to this:

Hopefully this has served as a good, general outline to how you can create your own custom MOSS search analytics solution using SSAS, ProClarity and ProClarity Dashboard!


Anonymous said...

Can you please provide more information about how you used the webs table? That seems to be a critical piece, but you have left out all of the details. Thanks.

Anonymous said...

Can you please provide more information about how you used the webs table? That seems to be a critical piece, but you have left out all of the details. Thanks.

SharePoint Usage Reports said...

MAPILab provides a very good SharePoint usage reporting solution: MAPILab Statistics for SharePoint. Detailed reports on visitors, documents, lists, search, etc. You can try its free trial version, or look through the online demo:

Group of reports about usage of SharePoint search services includes reports on intensity of queries by day, data about successful and no-result queries; also the group can be used for estimating user satisfaction of search results.

Anonymous said...

What words... super, a brilliant phrase