Recently I posted a blog entry on how to process all dimensions in a single Analysis Services 2005 database using SQL Server Integration Services (SSIS) package. Here I’ll add code to this package that will allow you to process whole database, cubes, measure groups or partitions.
Please follow steps from previous post on how to create new package and add script component. Then for the script component use code from bellow:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.RuntimePublic Class ScriptMain
Public Sub Main()
Dim ExecutionSuccessfull As Boolean = True ‘ If true, package executed without errors
Dim sProcessType As String = “ProcessFull”Dim oConnection As ConnectionManager
oConnection = Dts.Connections(”DM”)
Dim sServer As String = CStr(oConnection.Properties(”ServerName”).GetValue(oConnection))
Dim sDatabaseID As String = CStr(oConnection.Properties(”InitialCatalog”).GetValue(oConnection))
Dim oServer As New Microsoft.AnalysisServices.Server
oServer.Connect(sServer) ‘ connect to the server and start scanning down the object hierarchy
Dim oDB As Microsoft.AnalysisServices.Database = oServer.Databases.FindByName(sDatabaseID)If oDB Is Nothing Then
ExecutionSuccessfull = False
GoTo Done
ElseDim oDim As Microsoft.AnalysisServices.Dimension
Dim oCube As Microsoft.AnalysisServices.Cube
Dim oMeasureGroup As Microsoft.AnalysisServices.MeasureGroup
Dim oPartition As Microsoft.AnalysisServices.PartitionoServer.CaptureXml() = True ‘ Start capturing XML.
‘ oDB.Process(Microsoft.AnalysisServices.ProcessType.ProcessFull) ‘ Use this to process whole DB at once
‘ Process all dimensions
‘For Each oDim In oDB.Dimensions
’ ‘ This will generate XMLA, but because CaptureXML is True, will not execute it!
’ oDim.Process(Microsoft.AnalysisServices.ProcessType.ProcessFull) ‘ Process all dimensions
‘ NextFor Each oCube In oDB.Cubes
‘ oCube.Process(Microsoft.AnalysisServices.ProcessType.ProcessFull) ‘ Process all cubesFor Each oMeasureGroup In oCube.MeasureGroups
‘ oMeasureGroup.Process(Microsoft.AnalysisServices.ProcessType.ProcessFull) ‘ Process all measure groupsFor Each oPartition In oMeasureGroup.Partitions
oPartition.Process(Microsoft.AnalysisServices.ProcessType.ProcessFull) ‘ Process all partitions
Next
Next
Next
oServer.CaptureXml() = False ‘ Stop capturing XML
‘ Execute captured XML. First parameter Transactional, second parameter Parallel, third optional parameter: processAffected
‘ These are very important parameters!Dim oResults As Microsoft.AnalysisServices.XmlaResultCollection
Dim oResult As Microsoft.AnalysisServices.XmlaResult
Dim dataBytes(0) As Byte
oResults = oServer.ExecuteCaptureLog(True, True)Dim oMessage As Microsoft.AnalysisServices.XmlaMessage
‘Log the errors and warnings
For Each oResult In oResults
For Each oMessage In oResult.Messages
If oMessage.GetType.Name = “XmlaError” Then
‘The processing failed
Dts.Log(oMessage.Description, 0, dataBytes) ‘ You need to setup proper package loging to see this!
ExecutionSuccessfull = False
Else
‘It’s just a warning.
ExecutionSuccessfull = True ‘ if you want to fail on warning, change this to False
End If
Next oMessage
Next oResultEnd If
Done:
oServer.Disconnect() ‘ disconnect from the server — we are doneIf ExecutionSuccessfull Then
Dts.TaskResult = Dts.Results.Success
Else
Dts.TaskResult = Dts.Results.Failure
End If
End SubEnd Class
By default this package will build code to process each partition. This is just an example. There is commented out code there that can process all measure groups or cubes or whole database at once. Use this code as guidance to build your own SSIS package that does what you want.
In my data warehouse design I decided to maintain the list of partitions in SQL Server table. I have a “partition base” table that contains base information about each measure group and partitioning field (I have just one, usually date field). Then I have a stored procedure, that during each DW load calculates if new partitions have to be created or old ones have to be deleted and loads this information into extended partition information table. It also marks which partitions are affected by latest load and has to be reprocessed. Although this might seem like a complicated task, it is not. And it is well worth the effort. My SSIS package then compares this extended partition table with actual SSAS database and then drops/creates new partitions and process partitions that have to be processed.
I would be interested to hear how other people are managing processing of partitions?
Just what I was looking for… I recently posted the URLs for the other two post that have to do with processing dims and designing aggregates on my blog. I will add this URL too. Thanks for the code!
We handle processing of partitions through AMO code which is compiled as a dll and called through a SSIS package. All SSAS processing actions (updates, new partitions, deletes) are logged into a SQL table (we call is CubeProcessingLog). This table is then used with the ETL logging table to see what needs to be processed every hour or day. By doing it this way we only process the new incoming data; never older data. It’s fairly complex, however I could let it run for years and not have to touch it.
Jesse,
Why do you compile AMO code as dll? Are there any benefits for that?
I tried to use only AMO to create the Scheduler for ETL and dimensions/partitions data processing but…hit some bugs. some code for partition/dimension automation is at one of my blog. Basically, the scheduler is controling and recording the status and steps for SSIS ETLing data into Database and then parallel processing of cubes separating dimensions processing and partitions processing…etc..all configurations as saved in .xml file (web config) as metadata to drive ETL and cubes’ schema changes if any…
Sergey,
It looks like in your blog you are talking about special case when during processing you also changing source table of measure group. Is that right?
Have you consider having SQL View that is used as a source for the measure group. Then you would change just view definition (example select from different table) instead of trying to solve this in AMO?
Vidas Matelis
Yes, I have developed the controller that automatically updates data (ETL with SSIS packages) in the Data Warehouse tables and processes the dimensions and then partitions based on successfull or not successfull run of SSIS packages. The logic of controller is based on statuses of ’steps’: etl,process dimensions,process partitions, final cleansing of staging data. it also has locking mechanizm (based on logical grouping for certain cubes to be ETLed and Processed) to prevent any duplication of processing for dimensions shared in many cubes. I am thinking now to have PARALLEL processing, so far it is sequencial - based on success - the next block of steps is executed.The controller supports all type of MSAS processings.COntroller also can process the certain date, month, year or any FROM - To Interval with granularity hr,day, week, month, year.
And, yes, you are correct,Vidas, one of the tags in XML file that stores metadata to control/execute the controller, specifies the query binding and staging table to process partition based on time granularity for specified date or FROM-TO interval.
This controler also records log info into SQl tables what is happening during execution to be able to troubleshot where and why. on my blog I have posted just one class that gave me a headache…with AMO…only XMLA batches will be able to do the job and the XMLA file needed to be ‘parametrized’ like ###database### to be able to replace certain names from controller.
Thank you Vidas for sharing the source code - would use it for Process Full.
ProcessAdd for dimensions from querybinding - is a ‘feature’ that does not work as it supposed to.
see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1117591&SiteID=1
Else
‘It’s just a warning.
ExecutionSuccessfull = True ‘ if you want to fail on warning, change this to False
End If
Small possible issue, I think the ExecutionSuccessfull should be commented out, unless failures are always the last item in the results collection. Otherwise a warning may set the bit back to true even if an error did occur but was processed earliar in the collection.
Regardless thanks for the code.
Vidas:
Do you know whether there’s a way to control the parallelism of the partition reprocessing? I’m trying to write a package that will reprocess multiple partitions in one measure group, and I want to regulate the load on the server.
Thanks!
John,
I could not find a way to control Parallelism during processing using ExecuteCaptureLog method. I’ll try to research a bit more, and if I’ll find answer, I’ll post it here.
You probably already know that you can control parallelism with XMLA. There is Parallel parameter for that like in this example:
Vidas
Hello Vidas,
The code above helps a lot, I built the similar code in SSIS package for my project, it runs successfully, but I cannot see the detail processing log information, i.e which dimensions and partitions are processed, is there any log file generated by Analysis manager for this processing?
Thanks,
Emily
Emily,
I have not implemented proper loging in this package. My goal was just to share code to process objects.
I built a package that processes the measure groups in parallel by launching the same package four times in a sequence container. Prior to calling the sequence container, I inserted a record for each measure group I wanted to process into a transient table. Each measure group process registers itself via a SP and a registered process table and then uses the returned registered process id as a unique value to request the next available measure group to process. When I run it on my machine locally, it works like a champ. When I deploy it to a server with four processors, I randomly get one or two deadlocks which cause those instances to be shut down. The remaining processes still successfully process the rest of the measure groups, but I cannot figure out why I am encountering deadlocks. Any ideas?
Ron,
By default this package will already process everything in parallel. There is no way to limit how many parallel tasks will run, so it will use all processors available.
If you need to control parallelism, better way would be to build process XMLA script like I did in my post here: http://www.ssas-info.com/VidasMatelisBlog/7_ssis-package-to-process-all-dimensions-in-ssas-2005-db
Can you please give a bit more details why you are running 4 packages in parallel?
Vidas, This is just what I was looking for. Your explainations are great and easy to understand. Thanks a million