# Calculate Rule Importance in MS Association Rules

To follow this blog post you will need to read my previous post about MS Association rules and more important understand why and when you can use this algorithm.

When you look in the data mining viewer in Excel 2007/2010 or in BIDS the result on running the MS Association Rules data mining algorithm can look like this on the rules tab. The picture below is from BIDS and the market basket mining model that is part of the Adventure Works SSAS demo project. There is one itemset {Touring Tire, Touring Tire Tube}  that are visualised as two rules: Touring Tire -> Touring Tire Tube and Touring Tire Tube -> Touring Tire. Itemsets in MS Association Rules do not say anything about the relations between the products(Touring Tire and Touring Tire Tube). Rules on the other hand do exactly that and have the arrow between the products.

Note in the picture below that the strongest rule, Touring Tire Tube -> Touring Tire has a lower probability(0,543) than the third strongest rule, Touring Tire -> Touring Tire Tube(0,860). I hope that this blog post might help you to understand this and how to check the calculation for rule importance.

Thanks to Jamie MacLennan and Bogdan Crivat at Predixion that have helped me with answers to my questions.

Regarding the relation between rule probability and rule importance Jamie explained this in the following way. If people always by milk at the grocery store together with other products you will have a high probablity of milk to occur with other products. All products that are bought together with milk will have a 100 precent probability. This rule is not that interesting and will have a lower importance than probability.

The starting point for calculating the rule importance is a correlation matrix with Touring Tire Tube and Touring Tire transactions. The data source for the market basket model is the dbo.vAssocSeqLineItems in the AdventureWorksDW relational database. I order to follow how the MS Association rules algorithm works I have added 1 to the result for all cells except the total row and the total column.

 Touring Tire Not Touring Tire Total Touring Tire Tube 759 640 1 399 Not Touring Tire Tube 124 19 736 19 860 Total 883 20 736 21 259

Some of the queries are these:

select * from dbo.vAssocSeqLineItems
where Model = 'Touring Tire'  ---881 records

select * from dbo.vAssocSeqLineItems
where Model = 'Touring Tire Tube'  --1397 records

Select * from dbo.vAssocSeqLineItems v
Join dbo.vAssocSeqLineItems v1
On v.OrderNumber = v1.OrderNumber
Where v.Model = 'Touring Tire' and v1.Model = 'Touring Tire Tube'  --758 records

--Credit to Bogdan for this one.
SELECT DISTINCT OrderNumber FROM dbo.vAssocSeqLineItems
EXCEPT
(
SELECT DISTINCT OrderNumber FROM
(
SELECT OrderNumber FROM dbo.vAssocSeqLineItems WHERE Model='Touring Tire Tube'
UNION
SELECT OrderNumber FROM dbo.vAssocSeqLineItems WHERE Model='Touring Tire'
) AS T
)  --19 735 Records

With the correlation matrix cells completed you can go to next step and calculate the importance of the rules. The figures in the BIDS viewer are:

1,43722003 Touring Tire -> Touring Tire Tube
1,93898162 Touring Tire Tube -> Touring Tire

Let us start with the Importance of the rule Touring Tire Tube -> Touring Tire and check the numbers in the table above. Note that +1 has been added to each cell except for the totals that includes this addition.

The calculation is done like this in several steps:

1. (Touring Tire, Touring Tire Tube) / (Touring Tire Tube Total)  = 759 / 1 399

2. (Touring Tire, Not Touring Tire Tube) / (Not Touring Tire Tube Total) = 124 / 19 860

3. Divide the ratios in 1 and 2 = 86, 89

4. Add Log10 (in Excel) to point 3 above = Log10(86,89) = 1,9390. I am not an expert in mathematics but log10 means that you decrease the scale.

The same calculation for Touring Tire -> Touring Tire Tube is as follows:

1. (Touring Tire, Touring Tire Tube) / (Touring Tire Total) = 759 / 883

2. (Touring Tire Tube, Not Touring Tire) / (Not Touring Tire Total) = 640 / 20 376

3. Divide the ratios in 1 and 2 = 27,37

4.  Add Log10(in Excel) to the result of point 3 = Log10(27,37) = 1,437

I hope that you with these examples can follow the calculations of MS Association Rule importance directly from a data source.

Happy Association!

 Thomas Ivarsson has been working with the MS BI platform since SQL Server 7 in 1999.  Presently he is working in the telecom industry in Sweden, with a data warehouse based on SQL Server 2005. From 1999 to 2007 he worked as a consultant also on the three SQL Server BI platforms.  During the latest years he has spent most of time on SSAS, Reporting Services, ProClarity and Performance Point. He also has several years experience of the ETL process with DTS and SSIS. During 2008 and 2009 he has been working with introducing data mining in his daily business to see patterns in a service network behaviours. His blog can be found here: http://thomasianalytics.spaces.live.com

Tags: data mining

2007-2015 VidasSoft Systems Inc.