Report Portal

Simple MDX Performance Tips (by Richard Tkachuk)

Here are some simple steps you can take to ensure that you get the best performance from your calculations in SQL Server Analysis Services. Some of these exploit new functions or features in SQL Server Analysis Services 2005 while others apply to both this release and SQL Server Analysis Services 2000.

Use your attributes!

In SQL Server Analysis Services 2000, the filter function was a common means of selecting members from a set whose member property satisfied some condition; for example male customers could be expressed as :

Filter(Customer.name.Name.members, Customer.name.currentmember.properties(“Gender”) = “Male”)

Don’t do this in SQL Server Analysis Services 2005. Instead, create an attribute hierarchy Customers.Gender and do this:

(Customers.Gender.Male, Measures.Sales)

The attribute hierarchy can be made not visible if they should not be seen by end-users and still be available in MDX expressions.

The average sales to male customers can be expressed as this:

Avg(Customer.Name.Name.members*Customers.Gender.Male, Measures.Sales)

Why is .properties so much slower? It’s because getting the value of a member forces each member to be retrieved so the property value can be retrieved and queried. Exists is much faster because internal storage structures can be exploited directly

Use Is Comparison Operator

When comparing members, compare the objects using the IS comparison operator. Don’t do this:

Iif( [Currency Code].currentmember.Name = “USA”], …)

Do this:

Iif( [Currency Code].currentmember IS [Currency Code].[USA], …)

Translating the members to strings takes time and doesn’t need to be done.

Use NonEmpty Function

The NonEmpty function (new in SQL Server Analysis Services 2005) is optimized for removing empty tuples. So instead of doing this to get the customers who bought an Xbox,

Filter(Customer.Name.members, not IsEmpty( ([Measures].[Unit Sales], [Product].[Name].[Xbox])

Do this:

NonEmpty (Customer.Name.members, ([Measures].[Unit Sales], [Product].[Name].[Xbox]))

Note that Filter(<set>, Not Isempty(<measure>)) is equivalent to NonEmpty(<set>,<measure>) and the engine does the optimization automatically.

Another example. Instead of this:

Count(Filter([year and month].[Month],
Not IsEmpty(([Project Count], [Layer].[Layer].[web]))
or Not IsEmpty(([Project Count], [Layer].[Layer].[server]))))

Do this:

Count(NonEmpty ([year and month].[Month],{([Project Count],
[Layer].[Layer].[web]), ([Project Count], [Layer].[Layer].[server])})

Calculation Non Empty Behavior

Whether or not an expression resolves to null is important for two major reasons. First, most client applications use the “non empty” key word in a query. If you can tell the engine that you know an expression will evaluate to null, then needn’t be computed and can be eliminated from the query results before the expression is evaluated..

Second, the calculation engine can use the knowledge of a calculation’s non empty behavior even when the “non empty” keyword is not used. If a cell’s expression evaluates to null, then it needn’t be computed during query evaluation.

Note that the current distinction between how the engine uses an expression’s non empty behavior is really an artifact of the engine’s design and we’re working to eliminate the distinction in a future release. For now, we make this distinction because one or both optimizations is made depending how the calculation property non_empty_behavior (NEB) is defined - we’ll call the first “Non Empty” and the second “Calculation Engine”

When an expression’s NEB is defined, the author is guaranteeing the result is null when the NEB is null and not null when NEB is not null. This information is used internally by the calculation engine to build the query plan.

What can/should be specified in Non_Empty_Behavior (“NEB”) clauses is first and foremost determined by correctness/semantics aspects, before taking into account any performance goals.

Calculation type NEB applies to

NEB expressions recognized

Calculation Engine SP2 support

Non Empty SP2 support

Example

Calculated Measure

Constant measure reference

Yes

Yes

With Member Measures.DollarSales As Measures.Sales / Measures.ExchangeRate,

NEB = Measures.Sales

Calculated Measure

{List of two or more constant measure references}

No

Yes

With Member Measures.Profit As Measures.Sales – Measures.Cost,

NEB = {Measures.Sales, Measures.Cost}

Any (calculated member, script assignment, calculated cell)

Constant tuple reference

Constant single-measure set

Yes

No

Scope [Measures].[store cost];

        This = iif( [Measures].[Exchange Rate]>0, [Measures].[Store Cost]/[Measures].[Exchange Rate], null );

        Non_Empty_Behavior(This) = [Measures].[Store Cost];

    End Scope;

Here’s a summary of representative, common cases for using NEB. Note that defining NEB is never necessary to achieve the correct calculation results but defining it incorrectly can result in incorrect results. It should only be used if correctness implications are well understood and performance is poor without it. 

Measures.M1 + or – Measures.M2
-          In general, MUST specify both measures in NEB, for correctness reasons. Using SP2 is required for this.
-          In particular, if both measures belong to the same measure group, MAY specify just one of them in NEB (could have better performance). 

Measures.M1 * Measures.M2
-          In general, one cannot specify any correct NEB for this expression because the result is null when either M1 or M2 is null.
-          However, if it is guaranteed that one of the measures is never null (e.g. a currency exchange rate), then the other measure may be used in NEB.
-          In particular, if it guaranteed that, for any given cell, either both measures are null, or both are non-null (e.g. they belong to the same measure group), then one may specify both measures in NEB (SP2 required), or specify a single measure. 

Measures.M1 / Measures.M2
-          The first measure (the numerator, M1) must be used when defining NEB. 

Be very careful. This property is generally abused by those that use it. If this property is defined incorrectly, query results can be incorrect.

Don’t use LookupCube

It’s expensive. For any non-trivial usage, create a single cube that has the values you need.

Conditional Calculations

The IIF function can play havoc with existing engine calculation optimizations – the cause of the problem is that the optimizations can’t be brought to bear without knowing the expression that applies to a cell when the cell is being evaluated. We’re looking at the resolving this in a future release but in the meantime there may be some steps you can take.

Conditional Calculations and Scoping

The IIF is sometimes used to scope a calculation on a set of members by relying on the condition; for example:

This =
IIF([account].[Account Type].currentmember IS [Account].[Account Type].[Flow],
<e1>, <e2>

Don’t do this. This can expressed differently with much better performance:

Scope ([Account].[Account Type].[Flow]);
            This = <e1>;
End Scope; 

Scope ([Account].[Account Type].[Account Type].members – [Account].[Account Type].[Flow]);
  This = <e2>;
End Scope; 

Conditional Calculations and NULL

If you can, having one of the arguments as NULL can significantly improve performance:

IIF ( <condition>,  <expression>, NULL)

If this can’t be done, oddly enough this expression can be much faster (but only in AS2005 SP2 and later) and is worthy of experimentation:

IIF ( <condition>,  <expr1>, NULL) +  IIF ( NOT <condition>,  NULL, <exp2>)

(If <condition> is non-trivial – consider pulling it out as a calculated member.)

Conditional Calculations and Non Empty Behavior

If the expression has a simple non empty behavior (a subject matter discussed elsewhere in this document), then exploit it. For example, this expression flips the sign of an expression based on a condition:

IIF (<condition>, <exp>, -<exp>)

If <exp> has a non empty behavior, than do this:

<exp> * IIF(<condition>, 1, -1), non_empty_behavior = <NEB>

Single Branch Conditions

If you must use a condition and the calculation only applies when a condition evaluates to true, do not do something like this to have the cell retain its value:

This = IIF(
<condition>,
<assignment>,
CalculationPassValue(measures.currentmember,-1,relative)
); 

But do this instead:

IF <condition> THEN this = <assignment>;

Use ValidMeasure to Reduce Calculation Space

In SQL Server Analysis Services 2005, each measure group has the property IgnoreUnrelatedDimensions that defines how dimensions unrelated the measure are treated – either the dimension is ignored, or moving off the default member results ion a null value for the measure.

When defining calculations, don’t rely on this behavior but use the validmeasure function. For example:

scope(leaves([Time]), [Currency].[Currency].members - [Currency].[Currency].[Currency].[USA]
    Scope [Measures].[store Sales];
        This = iif( isempty(validmeasure([Measures].[Exchange Rate])), null, [Measures].[Store Sales]/validmeasure([Measures].[Exchange Rate]));
    End Scope;

Use Sum or Aggregate instead of Additions

Instead of doing this:

Create Member measures.x as
(Sales, Country.USA) +  (Sales, Country.Canada) + (Sales, Country.Mexico)… 

Do this:

Create Member measures.x as Sum({Country.USA, Country.Canada, Country.Mexico}, Sales) 

 

Clump Hierarchies in Big Crossjoins

If you can, keep hierarchies from the same dimension together. Why? Because internally, the engine does an exists between adjacents sets in a crossjoin. If a hierarchy from a different dimension is inserted between two hierarchies from the same dimension, this exists function happens afterwards and not before. This can greatly expand the space and impact performance.

Caching Calculation Results

The formula engine has an internal cache to reuse calculation results. But to cache something, the result has to be an addressable cell (or tuple) in the cube space.

For example, I have an application that calculates the ratio of a value to the maximum that value is over a set and renders the result as a histogram. The first way I tried to do this was with a calculation something like this:

with
member measures.y as
measures.[unit sales]
/
max(customers.[name].[name].members, measures.[unit sales])
select
measures.y on 0,
customers.[name].[name].members on 1
from sales

This took about 15 seconds on my laptop.  The calculation is inefficient in that the maximum value does not change over the set but the expression forces its reevaluation  as many times as there are customers (and there are over 10,000 of them). This approach is much better:

with
member measures.x as
max(customers.[name].[name].members, measures.[unit sales])
member measures.y as
measures.[unit sales]
/
(measures.x,[Customers].[Customers].[All Customers] )
select
measures.y on 0,
customers.[name].[name].members on 1
from sales

This takes less than a second! Over an order of magnitude performance improvement! Fun!

Now here’s a quiz – why must the expression for measures.y contain the member [Customers].[Customers].[All Customers] in the denominator? In other words, why can’t I do this:

with
member measures.x as
max(customers.[name].[name].members, measures.[unit sales])
member measures.y as
measures.[unit sales]
/
(measures.x)
select
measures.y on 0,
customers.[name].[name].members on 1
from sales

It is because we want to reference a cached value. Without [Customers].[Customers].[All Customers] overwriting the customer dimension’s attributes, the cell changes with a reference to the customer’s name and the calculation goes back to being reevaluated in each cell.

In the next release, we plan to do this automatically. But for now you have to give the engine a hint that the calculation doesn’t change.

Tags: mdx, performance, tip

 

2007-2015 VidasSoft Systems Inc.