Tuesday, August 12, 2008
In PerformancePoint 2007 you have the capability to setup custom properties and pass them into dashboard objects. This is especially useful for simplifying dashboards where you are linking scorecard KPIs to report views. Alyson Powell Erwin outlines how to do this in the PerformancePoint MSDN blog:
This is very handy. By setting up a custom property that contains measure names for the KPI you can use one report for each data source and pass the value of the custom property into the [Measures] endpoint on the report view. If you have a scorecard with 20 KPIs from the same data source and each KPI displays two report views it trims the number of reports required from 40 to 2.
Issue Found - Solved in SP1
However, I noticed something going awry when doing my performance testing while running Profiler against the Analysis Services database. Each time I clicked on a KPI it would execute the query for the two report views as expected AND submit a query for the scorecard with NO FILTERS. This basically amounts to running the scorecard query for all time because we pass the org and date filters in from the dashboard filter selections. This obviously has a negative effect on overall query performance from the Analysis Services server. However, in testing PerformancePoint 2007 SP1 I found that this problem has been addressed although I saw no mention of it specifically in the SP1 release notes. It is easy for this issue to go unnoticed because of the good caching done by Analysis Services - provided there is sufficient CPU and memory on the Analysis Services server.
Tuesday, March 25, 2008
In my earlier post NON EMPTY Filtering on PerformancePoint Scorecard, I mentioned that you may be able filter scorecards and/or reports by dimensions that are not the same as the filter by using a cross join or other MDX querying technique. This is true if the other dimension is in the same cube as the query or member selection used by the filter. Consider the following:
Cube: Sales
Dimensions: Organizational Hierarchy (lowest level is Site)
Site (this contains attributes that only apply to the site)
Filter: Org (Member Selection of Organizational Hierarchy)
Report: Sales by State/Province
Approach 1: We could put the Organizational Hierarchy dimension into the Sales by State/Province and pass the Org filter value directly into that. However, this may limit the flexibility of the report and its applicability to other dashboards. Additionally, if we were to use the report without passing a value into it we may complicate the resulting query and unnecessarily degrade the report’s query performance.
Approach 2: We apply a MDX function to the filter when it is mapped to the report filter to only show States/Provinces that apply to the Organizational Hierarchy (which are often geographically defined). The resulting Filter Formula would look something like the following:
NONEMPTY([Site].[Geography].[State].members,<<SourceValue>>)
A key cube design reminder here – make sure your cubes have a high performing default measure if you do not specify a specific measure in the NONEMPTY filter expression like the following:
NONEMPTY(NONEMPTY([Site].[Geography].[State].members,<<SourceValue>>),[Measures].[Site Count])
I like to use something that only returns one row in the base fact table per granular dimension value (i.e. one row per site). I generally set aside a measure group just to support this type of filtering or I set up a cube just to support PerformancePoint filters (to be discussed in a later post). This requires additional education on the part of publisher so they understand that the measure group or cube has very little analytical value. At a bare minimum our aggregation design needs to support the dimensions involved in the filter expression unless we have very small cubes.
One thing that would be nice would be if we could filter filters (no that is not a typo). Ideally, if we had a date filter, we would want to filter a MDX query or Member Selection filter (such as org) to only show members that have values for the applied date filter. This has value in industry Same-Store or Comparables dashboards.
Wednesday, February 13, 2008
I have had the pleasure (or pain, depending on your point of view) of working on several large data warehousing projects over the years and I have found that there are many key ingredients to a successful project. Most of these are well documented by Ralph Kimball and Bill Inmon so I won't reiterate those here. However, as an architect I have come to really value the role of the Business Analyst in these projects. It is a no brainer that the Business Analyst is the go-to person for requirements but I believe that there should be a closer tie in the project between the architect(s) and the analyst(s). Often, we architects tend to guide questions by our knowledge of the data and infrastructure. This by itself may not lead to the best solution. While I have learned to put on the "business" hat over the years, having an analyst (or more than one) on the project who can focus on asking questions from a business perspective not only frees me (the architect) up to focus on making a scalable, maintainable and extensible solution. The resulting brainstorming with the analysts fosters an environment where we can say "how do we solve for this?" instead of "we can't do that because the data...." Most questions can be answered and most problems can be solved. Having the analysts and architects working closely on the same team (not just treating the analyst as a "customer") results in a solution that answers the business's questions more completely while still maintaining a good solution. BTW - every team needs a Workdog!
Many legacy Analysis Services (or OLAP Services) implementations use ProClarity as a querying tool and publishing framework. The server infrastructure of ProClarity allows crafty implementers to push much of the customization in KPI’s and additional measures into the “Presentation” layer of the infrastructure. This is easier to manage and change in many cases than creating named sets and calculated members in the cubes as this generally takes a SSAS administrator.
With Microsoft purchasing ProClarity and its inclusion into PerformancePoint leads to a quandary for many solution designers. While ProClarity still works well as it was designed and is available as a report type in PerformancePoint, the ProClarity views have a very different look and feel than the native PerformancePoint report views. Also, due to the fact that the report views are basically ProClarity Web Standard views, the drill-from-anywhere-to-anywhere capabilities are not available directly though the PerformancePoint interface for these views. Additionally, the library of calculated measures and sets created and published in ProClarity are not available to PerformancePoint and thus cannot be used in scorecards and report views.
In many cases, the optimum solution here is to build these ProClarity sets and measures into their respective cubes as named sets and calculated members. CAVEAT – this is not a silver bullet solution. There is a cost in terms of SSAS processing and resources for named sets and calculated measures so this should not be done without ample testing prior to implementation in production environments.
Let’s first talk about calculated measures. Existing ProClarity solutions may have a laundry list of calculated measures in their libraries. Many times these may contain nested measures (i.e. a calculated measure that uses another calculated measure) and as these get more they tend to not perform as well as desired. First, it is important to identify which measures will get heavy use in other applications such as PerformancePoint (both scorecards and reports) and Excel. Those that have a wide range of applicability justify their creation and maintenance as calculations in their respective cubes. These calculation (and set) definitions can be extracted from the ProClarity repository database for those of you who are familiar with it. This is not a “recommended” approach in much the same way as using the “sys” tables in production code for SQL Server is discouraged. As a rule, I like to take these calculations and run through extensive performance analysis using Query Analyzer and Profiler. For instance, let’s convert a calculation that reflects sales growth over the prior week:
[Measures].[Sales]/paralleperiod([Date].[Week].[Week],1,[Date].[Week].currentmember)
The query would look something like:
With member [Measures].[Sales Growth] as ‘[Measures].[Sales]/paralleperiod([Date].[Week].[Week],1,[Date].[Week].currentmember)’
Select [Org].members on rows
,[Date].[Week].&[01/01/2007 – 01/07/2007] on columns
From sales
Where [Measures].[Sales Growth]
Make sure to test the query using dimension that can take advantage of your cube partitioning strategy (in this case [Date]). Test the query with the measure as a filter, on rows and on columns. Checking profiler to make sure that the query does not cause a scan of all process partitions or excessive subcubing. Once you have settled on a calculation formula that performs well in these cases it is now time to create the calculated member through Business Intelligence Studio. When you create the calculated member in the cube, make sure you define the format and especially the non-empty behavior. If the calculation involves division, I tend to use one of the measures from the denominator as the non-empty behavior measure. I realize that this may not always be feasible if the resulting non-empty behavior is not correct so do not do this thoughtlessly.
Next, let’s talk about named sets. Many times users and/or publishers have created dimensional sets in ProClarity to ease reporting and facilitate exception-based metrics (I will talk to this in a later post). Putting these into the cube involves much the same process as the calculated measures. First I test the named set through Query Analyzer watching with Profiler. Let’s consider a named set for a dimension Sales Type that identifies gift card sales:
{[Sales Type].[Sales Type].&[Gift Card], [Sales Type].[Sales Type].&[Vendor Card]}
Next we use this in the query:
With set [Gift Card] as ‘{[Sales Type].[Sales Type].&[Gift Card], [Sales Type].[Sales Type].&[Vendor Card]}’
member [Measures].[Sales Growth] as ‘[Measures].[Sales]/paralleperiod([Date].[Week].[Week],1,[Date].[Week].currentmember)’
Select [Org].members on rows
,[Date].[Week].&[01/01/2007 – 01/07/2007] on columns
From sales
Where ([Measures].[Sales Growth], [Gift Card])
Move the set around in the query to test it from all angles and try using it as part of an aggregate measure as well:
With set [Gift Card] as ‘{[Sales Type].[Sales Type].&[Gift Card], [Sales Type].[Sales Type].&[Vendor Card]}’
member [Measures].[Gift Card Sales] as ‘sum([Gift Card],[Measures].[Sales])’
Select [Org].members on rows
,[Date].[Week].&[01/01/2007 – 01/07/2007] on columns
From sales
Where ([Measures].[Gift Card Sales])
If the results are satisfactory then create the set in the cube through Business Intelligence Studio. Keep in mind that sets are cached in memory for each connection so extremely large sets are not recommended in some cases as are too many sets.
I tend to define the named sets at the top of my calculations definition so they can be used in all the subsequent calculations, if necessary. The sets and calculations created in Business Intelligence Studio are appended to the cube definition as an MDXScript so order of creation in the Studio is important as you must define parent sets and calculations before their dependent sets and calculations. Adding and modifying these generally does not require cube reprocessing but I have seen it happen in rare instances. Make sure you test the cube modifications first in a non-production environment to make sure that they do not require reprocessing. You wouldn’t want to do something that would cause reprocessing the cube partitions as the source data may have been purged and unrecoverable (hopefully you have planned for this with a good archive process). Since named sets are cached in memory and calculated members are generally resolved in memory, often the combination of these results in better query performance. Using named sets in calculated measures often requires aggregate functions (such as SUM) which have a higher CPU requirement so consider the calculation mdx definition carefully to find the correct balance of query performance and scalability considering your server resources. On last advantage of calculated measures is that the mdx definition is static so you lessen the likelihood of the same measure being calculated multiple, different ways which can not only end up with different results but vastly different queries and performance.
With these high-use sets and calculations defined in the cube(s) in an optimum way they are now available to all applications that connect to the cube such as PerformancePoint, Excel and Reporting Services as well as ProClarity and the resulting query performance should be more predictable than when they were in the ProClarity repository and assembled on the fly by ProClarity.
Tuesday, January 22, 2008
If you have a reasonably large Analysis Services implementation with users running adhoc queries via multiple tools (ProClarity, Excel, etc...) then you have probably run into problems with long-running queries. While longer-running queries may be acceptable in many data warehousing implementations, there will occasionally be a query (maybe even more than one ;) ) that causes resource issues and runs beyond the established limits of reason for your environment. Analysis Services does not have a native tool in SQL Management Studio for monitoring session activity and/or killing connections (hint: please add one). There is some sample code you can get from Microsoft along these lines but I wanted something simpler. In digging around for info on how to do this I found pieces of how to do it but nothing comprehensive so this is what I have put together so that I can do this through the SQL Management Studio.
Open a XMLA query window connected to the Analysis Services server in question.
Command 1 (Session Monitoring/Find it!):
<Discoverxmlns="urn:schemas-microsoft-com:xml-analysis">
<RequestType>DISCOVER_SESSIONS</RequestType>
<Restrictions>
<RestrictionList />
</Restrictions>
<Properties>
<PropertyList />
</Properties>
</Discover>
Command 2 (Kill it!):
<Cancelxmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<SPID>121672</SPID>
<CancelAssociated>1</CancelAssociated>
</Cancel>
Command 1 uses the Discover method to list the sessions. A lot of good info is in the output of this command: See the following example:
<returnxmlns="urn:schemas-microsoft-com:xml-analysis">
<rootxmlns="urn:schemas-microsoft-com:xml-analysis:rowset"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:schematargetNamespace="urn:schemas-microsoft-com:xml-analysis:rowset"xmlns:sql="urn:schemas-microsoft-com:xml-sql"elementFormDefault="qualified">
<xsd:elementname="root">
<xsd:complexType>
<xsd:sequenceminOccurs="0"maxOccurs="unbounded">
<xsd:elementname="row"type="row" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleTypename="uuid">
<xsd:restrictionbase="xsd:string">
<xsd:patternvalue="[0-9a-zA-Z]{8}-[0-9a-zA-Z]{4}-[0-9a-zA-Z]{4}-[0-9a-zA-Z]{4}-[0-9a-zA-Z]{12}" />
</xsd:restriction>
</xsd:simpleType>
<xsd:complexTypename="xmlDocument">
<xsd:sequence>
<xsd:any />
</xsd:sequence>
</xsd:complexType>
<xsd:complexTypename="row">
<xsd:sequence>
<xsd:elementsql:field="SESSION_ID"name="SESSION_ID"type="xsd:string"minOccurs="0" />
<xsd:elementsql:field="SESSION_SPID"name="SESSION_SPID"type="xsd:int"minOccurs="0" />
<xsd:elementsql:field="SESSION_CONNECTION_ID"name="SESSION_CONNECTION_ID"type="xsd:int"minOccurs="0" />
<xsd:elementsql:field="SESSION_USER_NAME"name="SESSION_USER_NAME"type="xsd:string"minOccurs="0" />
<xsd:elementsql:field="SESSION_CURRENT_DATABASE"name="SESSION_CURRENT_DATABASE"type="xsd:string"minOccurs="0" />
<xsd:elementsql:field="SESSION_USED_MEMORY"name="SESSION_USED_MEMORY"type="xsd:int"minOccurs="0" />
<xsd:elementsql:field="SESSION_PROPERTIES"name="SESSION_PROPERTIES"type="xsd:string"minOccurs="0" />
<xsd:elementsql:field="SESSION_START_TIME"name="SESSION_START_TIME"type="xsd:dateTime"minOccurs="0" />
<xsd:elementsql:field="SESSION_ELAPSED_TIME_MS"name="SESSION_ELAPSED_TIME_MS"type="xsd:unsignedLong"minOccurs="0" />
<xsd:elementsql:field="SESSION_LAST_COMMAND_START_TIME"name="SESSION_LAST_COMMAND_START_TIME"type="xsd:dateTime"minOccurs="0" />
<xsd:elementsql:field="SESSION_LAST_COMMAND_END_TIME"name="SESSION_LAST_COMMAND_END_TIME"type="xsd:dateTime"minOccurs="0" />
<xsd:elementsql:field="SESSION_LAST_COMMAND_ELAPSED_TIME_MS"name="SESSION_LAST_COMMAND_ELAPSED_TIME_MS"type="xsd:unsignedLong"minOccurs="0" />
<xsd:elementsql:field="SESSION_IDLE_TIME_MS"name="SESSION_IDLE_TIME_MS"type="xsd:unsignedLong"minOccurs="0" />
<xsd:elementsql:field="SESSION_CPU_TIME_MS"name="SESSION_CPU_TIME_MS"type="xsd:unsignedLong"minOccurs="0" />
<xsd:elementsql:field="SESSION_LAST_COMMAND"name="SESSION_LAST_COMMAND"type="xsd:string"minOccurs="0" />
<xsd:elementsql:field="SESSION_LAST_COMMAND_CPU_TIME_MS"name="SESSION_LAST_COMMAND_CPU_TIME_MS"type="xsd:unsignedLong"minOccurs="0" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
<row>
<SESSION_ID>9EBAB2DE-A23E-4183-8508-7A729BF41C85</SESSION_ID>
<SESSION_SPID>126449</SESSION_SPID>
<SESSION_CONNECTION_ID>233</SESSION_CONNECTION_ID>
<SESSION_USER_NAME>DOMAIN\User</SESSION_USER_NAME>
<SESSION_CURRENT_DATABASE>Cubes</SESSION_CURRENT_DATABASE>
<SESSION_START_TIME>2007-10-19T12:54:58</SESSION_START_TIME>
<SESSION_ELAPSED_TIME_MS>34141</SESSION_ELAPSED_TIME_MS>
<SESSION_LAST_COMMAND_START_TIME>2007-10-19T12:55:09</SESSION_LAST_COMMAND_START_TIME>
<SESSION_LAST_COMMAND_END_TIME>2007-10-19T12:55:07</SESSION_LAST_COMMAND_END_TIME>
<SESSION_LAST_COMMAND_ELAPSED_TIME_MS>23438</SESSION_LAST_COMMAND_ELAPSED_TIME_MS>
<SESSION_IDLE_TIME_MS>0</SESSION_IDLE_TIME_MS>
<SESSION_CPU_TIME_MS>390</SESSION_CPU_TIME_MS>
<SESSION_LAST_COMMAND>SELECT
HIERARCHIZE( { [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[00:00 (1:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[01:00 (2:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[02:00 (3:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[03:00 (4:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[04:00 (5:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[05:00 (6:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[06:00 (7:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[07:00 (8:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[08:00 (9:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[09:00 (10:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[10:00 (11:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[11:00 (12:00 a.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[12:00 (1:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[13:00 (2:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[14:00 (3:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[15:00 (4:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[16:00 (5:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[17:00 (6:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[18:00 (7:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[19:00 (8:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[20:00 (9:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[21:00 (10:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[22:00 (11:00 p.m.)], [Quarter Hour].[Time - Time-Half Hour-Hour].[Hour].&[23:00 (12:00 p.m.)] } )
ON COLUMNS,
{ [Measures].[Gross Sales], [Measures].[Gross Sales LY] }
ON ROWS
FROM [Sales]
WHERE ( [Org Hierarchy].[Hierarchy - Org Hierarchy].&[1000001], [Date].[Hierarchy - Calendar Week].[Date - Calendar - Week].&[2007 - (09/23/07 - 09/29/07)] )</SESSION_LAST_COMMAND>
<SESSION_LAST_COMMAND_CPU_TIME_MS>109</SESSION_LAST_COMMAND_CPU_TIME_MS>
</row>
</root>
</return>
Each <row></row> corresponds to a session. You can see the SPID (<SESSION_SPID>), user (<SESSION_USER_NAME>), the last command, etc. This is info can be the basis of a session monitoring/auto kill process (many of us have written similar applications in SQL Stored Procedures for SQL Server). We could even get more sophisticated and have a .NET application that runs this, applies a transform and outputs something far more readable and extend it with cancel/kill functionality, etc. Most of the information I found while researching this alludes to using the .NET application approach. However, I wanted something simpler.
We take the <SESSION_SPID> value from the output of the DISCOVER_SESSIONS command and plug it into Command 2’s <SPID> value. Notice I use the <CancelAssociated> options with a value of 1 (true). This cancels the process and any associated processes with the SPID in question. There are a host of options with the Cancel command and I am not going to cover them here. MSDN has fairly comprehensive documentation on it as well as the DISCOVER command.
Monday, October 08, 2007
For those who are familiar with PerformancePoint scorecards and dashboards, you have probably run into a problem where empty dimension values cannot be dynamically filtered on scorecards if you map the dimension to the rows or columns on the scorecard in the dashboard designer. One simple way around this is to use the NONEMPTY MDX function along with an EXISTS function call. Consider the following example:
Scorecard: Sales
Filters: Org, Time (Site is the lowest level of the Org dimension)
Requirement: Only show Sites that have values for the time selected and are Descendants of the Org selected
To solve this, create a link to the Org filter from the Scorecard (columns or rows). In the filter link formula (button on the filter link editor in dashboard designer) enter the following (with your particular dimension/hierarchy identifiers, of course):
NONEMPTY(EXISTS([Org].[Hierarchy - Org].[Level 08].members,<<SourceValue>>,'Sales'))
'Sales' is the measure group from the cube being used by the scorecard. This approach should work as well for using dimensions that may not be the same as the applied filter, though a cross-join may be required. I have not tried that yet so I will follow up on whether or not my theory is correct.