|Caching The Rows Returned By An MDX Query|
|Written by Chris Webb|
|Friday, 08 November 2013 20:00|
Reposted from Chris Webb's blog with the author's permission.
Here's another tip for those of you struggling with the performance of SSRS reports that run on top of an Analysis Services Multidimensional cube. Quite often, SSRS reports require quite complex set expressions to be used on the rows axis of an MDX query, and one of the weaknesses of SSAS is that while it can (usually) cache the values of cells returned by a query it can't cache the structure of the cellset returned by the query. What does this mean exactly? Well, consider the following query:
Here I'm taking every month on the Calendar hierarchy of the Date dimension and finding the top two customers by Internet Sales Amount for each Month; notice also that I'm slicing the query by a Product Category. The results look like this:
On my laptop this query takes just over three seconds to run however many times you run it (and yes, I know there are other ways this query can be optimised, but let's imagine this is a query that can't be optimised). The reason it is consistently slow is because the vast majority of the time taken for the query is to evaluate the set used on rows - even when the Storage Engine has cached the values for Internet Sales Amount for all combinations of month and customer, it still takes the Formula Engine a long time to find the top two customers for each month. Unfortunately, once the set of rows has been found it is discarded, and the next time the query is run it has to be re-evaluated.
How can we improve this? SSAS can't cache the results of a set used on an axis in a query, but SSAS can cache the result of a calculated measure and calculated measures can return strings, and these strings can contain representations of sets. Therefore, if you go into Visual Studio and add the following calculated measure onto the MDX Script of the cube on the Calculations tab of the Cube Editor:
You can then use this calculated measure in your query as follows:
Having done this, on my laptop the query is just as slow as before the first time it is run but on subsequent executions it returns almost instantly. This is because the first time the query is run the set expression used on rows is evaluated inside the calculated measure ReportRows and it is then turned into a string using the SetToStr() function; this string is then returned on the rows axis of the query and converted back to a set using the StrToSet() function. The second time the query is run the string returned by the ReportRows measure has already been cached by the Formula Engine, which explains why it is so fast.
Couldn't I have used a static named set declared on the cube to do this instead? I could, if I knew that the Where clause of the query would never change, but if I wanted to change the slice and look at a different Product Category I would expect to see a different set of rows displayed. While in theory I could create one gigantic named set containing every set of rows that ever might need to be displayed and then display the appropriate subset based on what's present in the Where clause, this set could take a very long time to evaluate and thus cause performance problems elsewhere. The beauty of the calculated measure approach is that if you change the Where clause the calculated measure will cache a new result for the new context.
There are some things to watch out for if you use this technique, however:
Latest Author Articles
- A Closer Look At Power Query/SSAS Integration
- Bidirectional Relationships And Many-To-Many In The Power BI Designer
- MDX Solve Order, SCOPE_ISOLATION and the Aggregate() function
- Counting Customers Who Have Bought All Selected Products
- Power Pivot / Power Query Read-Only Connection Problems In Excel 2013 - And What To Do About Them