My notes from 2007 Microsoft Business Intelligence Conference:
- SP2 has a lot of performance improvements for processing and querying. Upgrade ASAP.
- Attribute relationship is important. I have heard that at least 10 times in different sessions.
- Do not forget to set properly attribute relationship type (rigid or flexible). If you can, use rigid. If relationship is flexible and change is detected during processing, all aggregates are dropped. You need manually add them back (ProcessIndexes). No notice from server that aggregations dropped, you will noticed just slower performance.
- When processing partition, if design is OK, you should be able to process 2-5mln rows per minute. Of course, this depends on machine, CPU, memory, but if you are processing much less rows, investigate.
- For MOLAP partitions no need to specify slice property, it is detected automatically.
- Project “Real” is not used by Barnes and Noble in the real life. But they used lessons learned from that implementation.
- No indexes are build on partitions with less than 4096 rows.
- If your partition is at the day or week level, no aggregations are possible for month level. Aggregations cannot cross partitions.
- Use parent-child dimensions carefully. Might be worth flattening out hierarchy in relational table and then hide member if parent if the same. Parent child hierarchy cannot be used to create aggregates, so almost always it is going to be slower. I does not mean you should not use it!
- Each KPI formula is defined as hidden calculated measures, except when it is just reference to another measure. For better performance consider creating real calculated measures with properly designed calculations and then use these measures in KPI definitions.
- .MemberValue function could be used to retrieve Member value.
- In multiple sessions I have heard talking about “VisualTotals” problem. Basically best performance can be achieved for natural hierarchy totals – when children’s parent contains total value. When you use any filters, roles security based on username, subselect statements, etc, visual totals are calculated and performance will be affected.
- For parent child dimensions there is known big performance problem if queries use key attribute. It is recommended to hide it, as most likely you do not need it anyway.
- In pre SP2 release you were able to increase YTD calculation performance by using fancy logic (
Mosha blog example). After SP2 best performance can be achieved by using simple YTD() function.
- Excel 2007 uses subselects and that means visual totals. But you can run run excel in backward compatibility mode
- 64bit machines does not run faster than 32bit. Many older 64 bit machines actually run slower than 32bit. That was fixed recently. 64 bit machines just lets you use more memory.
- In real life don’t just do full process on cube. Plan how you can improve processing. It is recommended that you submit in separate transactions ProcessData and ProcessIndexes statements instead of ProcessFull.
- ParalleProcess option should be calculated based on rule: for each CPU you should consider having 1.5-2 parallel processing threads.
- If processing affects source relational database to much, consider changing parameter Data Source Connections to lower value
- Process partition clears cache for all partitions in affected measure group.
- If you are using crossjoin, put larger set always as first parameter of crossjoin
- Exist function works on dimensions, not cubes, so they are faster
- A rough estimate: consider having 4GB per CPU core for your 64bit server
- For performance improvement consider partitioning measure group with distinct count measure by distinct count attribute
- Do not use String type fields for distinct count measure
- Number of partitions for very large measure group should be in hundreds and not thousands
- Aggregation size should be <= 30% of fact size
- You should have hundreds and not thousands of aggregations
- If aggregation does not fit in memory (too large), consider splitting using smaller partitions or adding more memory. If during aggregation generation you have to save partial results to disk, performance will be very slow
- Fastest filter example: Exists(Customer.Members, Gender.[Male])
- When analyzing bad performing queries, first establish if performance problem is in Storage engine or in Query engine.
- Storage engine (SE) works in parallel, multiple CPUs are used on the server
- Query engine uses just one CPU. If you query is not coming back, and server uses just 1 CPU, that is Query Engine (QE) problem.
- If you have defined MeasureExpression, no aggregations are used.
- In performance monitor consider checking indicator number of cells calculated. You might be surprised how many cells might be affected by the query.
- Kernel profiler could be used to show CPU usage per each Stored Procedure. This might help you find slowly performing Stored Procedures.
- File monitor can help you identified what files are touched by analysis services queries.
- Optional Object Path property in SQL Profiler gives you full path about object
- Best performance can be achieved using standard aggregation functions instead of rewriting them (Example LastChild). But some of them require Enterprise Edition of SQL Server.
- If you can, consider precalculating values loaded into analysis services when calculation involves leaves of dimensions. Example if you have rate table that is applied to each customer, consider applying rate before loading data into SSAS. Performance in such case will always be better.
This all is from my notes, nothing tested or reviewed. If you will find any mistakes, please let me know.
Added June 3, 2007: Please also read Mosha Pasumansky notes about these notes.
Posted in SSAS | Comments Off on My notes from Microsoft BI Conference