Report Portal

Analysis Services Distinct Count Optimization

Summary 

Distinct count (such as unique visitor counts on a Web site) calculations provide valuable information but come with a number of performance challenges. This white paper describes tests that were performed to determine how best to optimize these calculations and includes best practices based on the test results.

Introduction

While distinct count (such as unique user counts or visitor counts) calculations provide valuable information, these calculations are not easy to solve.

A simple example is a grocery "market basket" scenario where a single visitor can purchase various food items. While many items may have been purchased from your store, knowing whether one or many customers made these purchases has an impact on your marketing campaigns. In a more tech savvy Web analytics scenario, if each visitor has a high number of page views for each visitor on your site, you have a very “sticky” Web site with relatively loyal customers. Financial Web sites are an example of sites with this type of visitor profile where many transactions (research, purchases, sells, options, and so on) are performed by a set number of users. On the other hand, an automotive information Web site may have a relatively low page view/visitor ratio with a high number of visitors. These latter sites may have a lot of “non-sticky” visitors because users are visiting different sites looking for the latest information on their vehicle of choice. Business intelligence in the form of loyalty can be derived from these distinct count calculations where the former site has more loyal customers while the latter has fewer loyal customers. Ultimately, you want to adjust the targeting of your Web advertising campaigns to suit your visitor population. For more information, see Amir Netz’s excellent article Analysis Services: DISTINCT COUNT, Basket Analysis, and Solving the Multiple Selection of Members Problem; while it is specific to Microsoft® SQL Server™ 2000 Analysis Services, it provides useful information on the distinct count and grocery basket analysis problem.

Often distinct counts are calculated by custom applications and pipelines using a variety of parsing techniques to determine the distinct number of items. The distinct count measure was introduced in SQL Server 2000 Analysis Services, an enhancement that was extremely useful for business intelligence analysts. It was originally based on the dimension structure in Analysis Services and provided fast access to distinct count calculations for small-to-medium sized volumes of data. In SQL Server 2005 Analysis Services, there is more flexibility in how you structure data so that you now get even better performance.

While distinct count calculations are much faster in SQL Server 2005 Analysis Services, the purpose of this white paper is to provide a methodology for improving the performance of enterprise-size distinct counts. If you work with a relatively small set of data, you will see performance advantages when using these techniques, but you will not experience the same gain compared to those seen with enterprise-size data. In our case, enterprise size is loosely defined as greater than or equal to 20 million rows a day.

Table of Contents

Introduction
Background
 Benefits of Analysis Services Distinct Count Calculation
 Analysis Services Improvements
Overview
Distinct Count Optimization Techniques
 Create Separate Measure Groups
 Create Customized Aggregations
 Define a Processing Plan
 Create Partitions of Equal Size
 Use Partitions Comprised of a Distinct Range of Integers
 Distribute the Hash of Your UserIDs
  Modulo Function
  Hash Function
 Choose a Partitioning Strategy
Distinct Count Partitioning Strategy Analysis
 Data Set
 Partitioning Strategies
 Hardware Resources
 Testing Methodology
  Single-User Scenarios
  Multiuser Scenarios
 Test Results
 Test Observations
  All Single-User Queries Ranking
 Time Set Queries Ranking
  Concurrent Queries Ranking
  Average of All Rankings
 Perfmon Analysis
  Physical Disk
  Memory
  Processor
  Discussion
Conclusion
Appendix
 Analysis Services Distinct Count References

Read more...

Tags: distinct count, must read, paper

 

2007-2015 VidasSoft Systems Inc.