Thoughts on Analysis Services in the Cloud
Reposted from Chris Webb's blog with the author's permission.
So far this year I’ve indulged myself a few times in a bit of futurology (here and here, for example) regarding directions the Microsoft BI stack might take. The one area I haven’t touched on recently, though, is what Analysis Services in the cloud might look like; I did speculate a bit here but that was a while ago now and before several relevant technologies had been announced. It’s certainly coming, and presumably somebody somewhere is working on it right now in some top-secret bunker in Redmond, so maybe a few public comments on what we the user community would want from it would be helpful…? Anyway, welcome or not, here are some thoughts…
So why would you want or need Analysis Services in the cloud rather than regular Analysis Services? I can think of a few things it should be able to do to justify its existence:
- It should be cheaper than hosting all the infrastructure in-house and it should be scalable to the Nth degree. OK, so these are the standard reasons dragged out for cloud-based anything, but with Analysis Services there are two obvious times when resource usage peaks - processing and when a big/complex query runs – and equally there are times when the server can be completely quiet; so the idea of being able to make use of near infinite resources when you need them to make the processing/querying super-quick, but only pay for what you use, is very appealing. From this point it follows that when you need to use extra resources, you need a platform that can scale to be able to make use of those resources.
- As well as being able to work with ‘traditional’ data sources such as your corporate data warehouse, it should be able to work with cloud-based data sources be they relational or non-relational (like Amazon SimpleDB, Google’s recently-announced BigQuery, Azure Table Storage and all the rest), feeds (like OData or GData), linked data (RDF), web-based spreadsheets like Google Docs or the Excel web app, or completely unstructured data from anywhere on the web (maybe something like how Google Squared works). Supporting the integration of data modelled for all of these different types of database would be a challenge but I think it should be possible.
- It should be available as a data source for anywhere else on the web – your own apps, reports, web-based spreadsheets and so on – as well as desktop apps like Excel. The really important thing, for me at least, would be for it to expose an XMLA interface to allow ad-hoc querying (note that Excel can’t talk direct to an XMLA provider, it only does OLEDB for OLAP, but it’s possible to bridge the two and Simba already sell an OLEDB provider that does this); grudgingly, I’ll admit a SQL query interface would be useful too. The ability to expose data via an OData feed would be a must as well.
- I’d also like to see it support some basic ETL functionality too, because if it is as scalable and fast as I’d like to be then it would have an obvious secondary use for large-scale number crunching – aggregating data, doing lookups, sorting, many of the things that you might do today in the SSIS data flow or which you might look at Hadoop to do. Derived columns and lookups could all be done with DAX or MDX calculations; pivoting, sorting and filtering could all be done (and configured very easily with a good client tool) through the right MDX query. I can imagine it acting as a datasource for itself: you’d load data into a cube or a table or whatever, create a query on top of it which is made available as a feed, then take the data from that feed and load it into another cube/table, and so on.
- Following on from the last two points, it’s not enough to be able to act as a data source, Microsoft would need to come up with a decent web-based client tool specifically for use with it. And no, vanilla pivot tables on the web wouldn’t cut it, nor would SSRS in the cloud (not that that I wouldn’t want that) – you’d need to have the wow factor that something like Live Labs Pivot has as well as serious, power-user functionality like the Proclarity desktop client; it would probably need to be built using Silverlight or HTML5. I still think there’s an opportunity to rethink what a client tool could be here, blur the line between BI client tool, spreadsheet and database and come up with something really new.
I wouldn’t want, and don’t expect to get, a recognisable version of Analysis Services 2008 in the cloud in the way that SQL Azure is recognisable as server-based SQL Server. While I still see an important role for Analysis Services as we have it today as in corporate BI scenarios I don’t think there’s any point transferring it to the cloud with exactly the same functionality. Some things, like dimension security, would still be needed, but some things, like cell security, we could probably live without. PowerPivot in the cloud would make more sense as a starting point, so long as it was not just a straight copy of PowerPivot on the desktop: the ability to scale to really, really large data volumes, as in the first bullet above, would be the key feature and the only real reason why customers would want BI in the cloud. And it’s not just the scalability of simple queries either – queries that use complex calculations would need to scale too. You know what, though? When I look at DAX I can’t help but think it was designed with this requirement in mind; I can see how the evaluation of DAX expressions could be easily parallelised.
So all this sounds pretty ambitious, even a bit pie-in-the-sky. The way I see it, though, as far as BI-in-the-cloud goes there’s everything still to play for and if Microsoft doesn’t deliver then someone else will. Could it be Google, or Amazon, or some startup we’ve never heard of? Now that Google BigQuery has a SQL interface, it’s only going to be a matter of time before someone builds a BI app on top of it (I wonder if Mondrian can be made to work with it?) – and it certainly seems to be fast. Microsoft needs to think beyond using BI to defend the Excel/Sharepoint franchise and start thinking about the future! With ten years of BI experience behind it, Microsoft should be in a strong position to move forward into the cloud but it’s only going to succeed if it’s innovative. I understand there will be some new PowerPivot-related product announcements at the BI Conference this week; I’m keeping my fingers crossed.
As always, your comments and ideas are welcome…
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.wordpress.com/ .