Loading Aggregate Data in Yukon 2005

Written by Richard Tkachuk on 19 May 2007. Posted in Performance Tuning

Loading Aggregate Data in Yukon 2005

In some cases, the source data is pre-aggregated in one, some or all dimensions. In the simplest example, employee salary is loaded to all employees. Because a manager’s salary is not the aggregate of his reports, it should not be aggregated up the employee dimension. There are several other such examples. This document discusses how pre-aggregated data can be loaded directly into Analysis Services 2005.

Note: the approach described here applies only to Analysis Services 2005 and not earlier releases.

Introduction

In Analysis Services, an assumption is usually made about the content of members. Leaf members contain data derived from underlying data sources; nonleaf members contain data derived from aggregations performed on child members.

This is not always true. The example above is one such – the salary of a manager is not the sum of his employee’s salaries. Analysis Services introduced datamembers – each non-leaf member has its own system-generated datamember. By default, datamembers containing data are aggregated long with the other children of the member. This behavior can be changed with calculations.

Loading data to non-leaf members

Data for non-leaf members is not always the aggregate of its children. Examples include salary, sales, evaluation scores, whatever. This behavior is achieved by replacing a member’s data with its datamember’s data as follows:

Calculate;
this = [Employee].[Employee].currentmember.datamember;

And if there are two such dimensions where a member’s datamember contains the value for the member, add another similar statement:

Calculate;
this = ([Employee].[Employee].currentmember.datamember, [Channel].[Channel].currentmember.datamember);

This approach can be continued to scenarios where all aggregates are stored in the source database and where Analysis Services is used for its metadata and client application support (eg, aggregates already calculated for other purposes).

About recursion:

Experienced MDXers may wonder what happens when the currentmember is a datamember and the above assignment assigns the value of a cell to itself. In Analysis Services 2000, this generated an error. In Analysis Services 2005, the engine is a bit smarter. The engine resolves recursion by using the previous value of the cell on the right-hand side of the statement to determine the left-hand side. Things just work.

Conditional Substitution

Just as common as always loading data to non-leaf members is the conditional substitution of such data. For example, a budget might include an estimate for a full year’s revenue even if revenue exists for the first few months of the year; or a manager’s budget can replace the values of his supervisors. In this situation, the data loaded to a datamember substitutes for the member if such data exists, otherwise data aggregates normally.

For example, consider a product parent-child hierarchy ([Product].[Parent Product] where budget is made at any level in the hierarchy. If a budget value exists for a product, it is the budget for that product even if some other budget estimates was made for children of the product. This is done as:

The assignment replaces non-leaf members with their datamember only if the datamember’s value is not empty, otherwise the parent is populated with the aggregate of its children.

Advanced note: One might wonder – why is the aggregate function required? Why can’t the expression read as follows:

Calculate;
this =
iif (not isleaf([Product].[Parent Product].currentmember),
iif( not isempty([Product].[Parent Product].currentmember.datamember),
[Product].[Parent Product].currentmember.datamember,
measures.CURRENTMEMBER
),
measures.CURRENTMEMBER
);

(Where the change is in bold). This is simpler, but it’s wrong. The measures.currentmember causes recursion to kick in and goes to the prior pass. The prior pass does not contain the assignment and data in any child datamembers will be aggregated along with its siblings.

And if there are more than one such hierarchies in the cube, the statements can simply be stacked on top of one another. Say the budget is made by distribution channel as well. Then the script becomes:

Calculate;
this =
iif (not isleaf([Product].[Parent Product].currentmember),
    iif( not isempty([Product].[Parent Product].currentmember.datamember),
       [Product].[Parent Product].currentmember.datamember,
        Aggregate( [Product].[Parent Product].currentmember.children )
    ),
    measures.CURRENTMEMBER
);
this =
iif (not isleaf( [Channel].[Parent Channel].currentmember),
    iif( not isempty([Channel].[Parent Channel].currentmember.datamember),
        [Channel].[Parent Channel].currentmember.datamember,
        Aggregate( [Channel].[Parent Channel].currentmember.children )
    ),
    measures.CURRENTMEMBER
);

Without Parent-Child Hierarchies

Up until now, everything has been done with parent-child hierarchies. The automated generation of the datamembers is simple and there are built-in mdx functions to navigate to the datamember from the parent member. But parent child hierarchies have some disadvantages. Most importantly, parent-child hierarchies are not included in aggregations. For small hierarchies, this is not important. But if the hierarchy is large, using a parent child hierachy might result in a performance hit one is unable or unwilling to accept. As welll, morphing your normal hierarchy to a parent-child hierarchy is a pain and adds to adminitration costs.

The same approach is possible with regular hierarchies (ie, not parent-child), but some extra work is required. Extra members have to be created to substitute for the datamembers, these members have to be hidden, and the expressions have to be modified to get at these substitute datamembers (because the datamember is no longer available).

The common approach for generating these members is to create members with the same name at every level below the level at which the member appears as in the diagram below:

The green members represent the “user datamembers” (which will be the term used to refer to them henceforth) created to load the data.

To hide these user datamembers, set the property “HideMemberIf” on each level to “ParentName”. These members will vanish. (HideMemberIf accepts other values to indicate when a child is hidden. The complete set of value are Never, OnlyChildWithNoName, OnlyChildWithParentName, NoName, ParentName. One of these others can be used so cities like New York, New York don’t also vanish.)

Where parent-chilld hierarchies are used, the “Datamember” mdx function was used to get at each member’s datamember. No such function is available with user datamembers, so an alternative approach is required. One simple approach is to use the strtomember function to construct the user datamember name from the current member’s name. If the name of each user datamember is unique on its level, the user datamember for each data member is :

STRTOMEMBER(Customer.Geography.City.Level.UniqueName + "." + Customer.Geography.CurrentMember.Name)

(There are many different approaches for this should the name uniqueness rule not hold up…)

Summary

Loading aggregate data is a common requirement from non-trivial cubes. Simple mdx allows this to be done to allows replace a member’s data with the source data in one or many dimensions, or to do it conditionally based on the existence of the data.

Tags: performance

Loading Aggregate Data in Yukon 2005

Similar Articles