What happens when you do a Process Update on a dimension?
Reposted from Chris Webb's blog with the author's permission.
Over the last few days I’ve been involved in an interesting thread on the SSAS forum regarding what happens when you do a Process Update on a dimension. It’s a topic that is not widely understood, and indeed I’ve not known all the details until today, but it’s nonetheless very important: one of the commonest performance-related problems I see in my consultancy work is partitions that have aggregations designed for them, but where those aggregations aren’t in a processed state because a Process Update has been run on one or more dimensions. Anyway, just now Akshai Mirchandani from the dev team posted a really good overview of what actually happens when you run a Process Update on that thread, so I thought I’d copy here to ensure it gets a wider audience:
Here is a quick summary of what happens when you do ProcessUpdate:
1. After the dimension has been updated, the server analyzes the changes that occurred to the dimension. In 2005, this analysis was pretty simple and would often incorrectly detect that major changes had occurred that required clearing of indexes and aggregations. In 2008, this code was improved such that it more often would realize that nothing significant has occurred. It's a fairly small (but useful) optimization -- I guess nobody thought it was worth documenting!
2. Based on this analysis, the server will decide whether or not indexes and aggregations need to be cleared. If no (e.g. because records were only added and not deleted/updated), then the partitions won't be affected.
3. If indexes/aggregations need to be cleared, then the server will check if ProcessAffectedObjects was enabled -- if yes, then instead of clearing the indexes/aggregations it will rebuild the indexes/aggregations.
4. The act of clearing the indexes/aggregations also shows up as "partition processing operations" in Profiler -- that's one of the things that has been confusing some of you.
5. When aggregations are cleared, only the flexible aggregations need to be cleared because we're guaranteed by the rigid relationships that the members cannot have moved and therefore the rollups cannot have changed. However, indexes can still have changed and therefore you may still see the partition processing jobs kick off to clear the indexes.
6. ProcessIndexes and ProcessClearIndexes take care of building both bitmap indexes (aka map) and aggregations -- the context is that both aggregations and bitmap indexes are generically considered "indexes".
Really the main takeaway here is that if you ProcessUpdate a dimension, you should strongly consider either doing ProcessAffectedObjects or an explicit ProcessIndexes on the affected partitions so that bitmap indexes and flexible aggregations get rebuilt. The advantage of explicitly doing ProcessIndexes is that you can bring your cube online earlier and have the indexes/aggregations get processed more lazily in the background -- a number of customers prefer to do that because their processing windows are too small to wait for the indexes to get processed.
Also related to this topic, I thought I’d also highlight a great post by Darren Gosbell where he shows how to find out if your aggregations are processed or not:
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.wordpress.com/ .