Report Portal

How The New IsAvailableInMDX Property For Analysis Services Tabular Can Reduce Memory Usage And Speed Up Processing

Reposted from Chris Webb's blog with the author's permission.

Following on from my post the other week about the new RowsetSerializationLimit server property, I thought it would be a good idea to write about why the new IsAvailableInMDX property (announced in the same blog post) is so important. In fact, I would say that everyone using Analysis Services Tabular 2017 (CU7 or higher) or Azure Analysis Services should spend some time investigating it because the potential benefits in terms of reduced memory usage and faster processing times are significant, especially for larger models.

First of all, what does it actually do? As the blog post says, it allows you to stop attribute hierarchies from being built on columns when you don’t need them. But what are attribute hierarchies? They are structures that are used only when you are querying your Tabular model using MDX; Excel PivotTables, for example, generate MDX queries when they are connected to Analysis Services Tabular whereas Power BI always generates DAX queries. An attribute hierarchy allows a column on a table to be used on the rows or columns axis of an MDX query, and in Excel that means you will be able to drag that field onto the rows or columns area of a PivotTable. Attribute hierarchies are used by some DAX functionality too – for example the TreatAs() function (at least for now) needs them to be present to work. Frustratingly, the DAX functionality that does need attribute hierarchies is not documented.

To give you an example, consider a Tabular model that contains a table with three columns, Product, Customer and Sales, and a measure that sums up the values in the Sales column.

image

 

I can query this Tabular model in Power BI, for example by creating a Matrix visualisation:

image

I can also get the same values out using an Excel PivotTable:

image

Now the reason I can create this PivotTable is that Analysis Services Tabular has created attribute hierarchies on the Customer and Product columns. However, the important thing to understand is that Analysis Services Tabular creates attribute hierarchies on every column on every table by default, including the Sales column. This allows me to create a PivotTable like this, with the distinct values from Sales on the rows of the PivotTable:

image

image

You’re probably thinking, why would I ever want to use Sales – a measure column – like this? And the answer is you probably wouldn’t, even though Tabular allows this by default. What’s more, building the attribute hierarchy for Sales makes processing slower and the resulting hierarchy uses memory, so all this comes as a cost. The IsAvailableInMDX property is therefore very useful because it allows you to stop attribute hierarchies from being built on columns like Sales where they serve no real purpose.

Unfortunately at the time of writing SSDT doesn’t allow you to set the IsAvailableInMDX property but the good news is that the latest versions of Tabular Editor do:

image

Setting IsAvailableInMDX to false for the Sales field has no impact at all in Power BI, so long as you are not using functionality like TreatAs() that needs it. In Excel, it just means that it is no longer possible to drag Sales onto rows or columns in a PivotTable – the Sales Amount measure still works:

image

As a result, there are two recommendations that can be made:

  • If you are not using any client tools that generate MDX queries (such as Excel) or you want to prevent your users from using them, and you can be sure that it does not affect any of your existing Power BI reports or DAX calculations, you can set IsAvailableInMDX to false on every column of every table
  • If you are using client tools that generate MDX you can still probably set IsAvailableInMDX to false on every measure column and not lose any important functionality

How much of an impact will doing this have on processing times and memory usage? It depends, but it could be a lot. The anecdotal evidence on Twitter is promising:

image

image

I did my own (not particularly scientific) test using a table with five million rows and ten columns, each of which contained the integers between one and five million. Here’s the M query to generate such a table without the need for an external data source:

let
    Source =
	Table.FromColumns(
		List.Repeat(
			{{1..5000000}},
			10
		)
	),
    #"Changed Type" =
	Table.TransformColumnTypes(
		Source,
		List.Transform(
			Table.ColumnNames(Source),
			each {_, Int64.Type}
		)
	)
in
    #"Changed Type"

On my laptop, with IsAvailableInMDX set to true for all ten columns, a full process on this table took around 105 seconds and the table size reported by Vertipaq Analyzer was 381MB. After changing IsAvailableInMDX to false for all ten columns, the time for a full process went down to around 81 seconds and the table size was down to 191MB.

In summary, this is one of those seemingly obscure technical changes that turns out to be way more useful than you might think. If you test out setting IsAvailableInMDX on your Tabular model, please leave a comment letting me know what kind of impact it had!

[Thanks to Daniel Otykier for providing a lot of information for this post]


chris-webb  

Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.wordpress.com/ .


Tags: tabular

 

2007-2015 VidasSoft Systems Inc.