Gemini was released about a week ago and I was playing with it almost every evening. I am still trying to understand what it is, how it works, what its limitations are, how to use DAX, etc. And I can say that I like what I see so far. Kasper de Jonge and Chris Webb already posted their initial Gemini reviews. In this post I will share my experience with Gemini so far.
First of all I want to point that for Gemini tests all you need is Excel 2010 and the Gemini Add-in file (it is just about 30MB). You do not need SQL Server 2008R2. I am pointing this out because as I was setting up my testing environment I installed SQL Server 2008R2 and later realized that it is not required. It is quite amazing that this 30MB add in file contains such powerfull software – the Gemini front-end, and the in memory Analysis Services server. Of course for Enterprise level installation you will need SQL Server 2008R2 and Sharepoint 2010, but such setup is not part of my current tests.
After my initial tests with a few thousand records I finally decided to do tests on bigger tables. I took the AdventureWorksDW database and expanded the FactInternetSales table to 22mln records by simply duplicating the included records. This table size grew to 3.2GB. My goal was to see how well Gemini can handle millions of rows. I ran my tests on my Laptop with 3GB of RAM and an Intel Duo CPU 2.5GHz processor. On the same laptop I have my SQL Server 2008 database with my expanded data, Excel 2010 and the Gemini add-in. I limited SQL Server memory to 500MB.
My first tests to load 22mln records into the Gemini add-in failed. After loading over 21mln records I got an error message:
Memory error: Allocation failure : Not enough storage is available to process this command. .
Server: The operation has been cancelled.
Server: The operation has been cancelled.
I tried to stop some services, close other applications and then repeated the same table load into Gemini, but always at the very end I was getting the same error message. Finally I started to reduce number of records in my FactInternetSales table and that helped. I found that on my laptop I was able to load just about 20mln rows. Considering that this laptop also runs SQL Server, I can live with this limitation.
It took me about 5min to load into Gemini 20mln rows and as I repeated my FactInternetSales table load multiple times, I saw consistent Gemini data load speed – about 4mln records per minute. Of course, load type depends on how many fields there are in the table, machine power, etc. If your SQL Server will be on the network, your load speed will depend on the network’s speed too. But this gives you an idea – when working with Gemini and bigger data sets it will take time to get data into Gemini. This actually comes as no surprise, just I am impressed with this speed. During the load I was checking the task manager and noticed that Excel was using 80-90% CPU time and SQL Server was using 10-20% CPU time. Total CPU usage during the data load into Gemini was almost always 100%.
Data loading was most time consuming operation. As soon as I had 20mln rows in Gemini I wanted to see if sorting and filtering operations really works as fast as I saw in demos before. I always doubted that Gemini can work with millions of rows that fast. But I was not disappointed – filtering on any field or sorting worked in sub-seconds. I tried to do multiple filters, sorting, filters again but Gemini responded so fast that I always had to track the displayed record count in the left bottom corner just to confirm that the operation was executed. I was very impressed. This will please many Gemini users and I expect to hear many wows during demos.
Then I loaded a few dimensions and manually added relationships to the fact table. After that I created a pivot table and started to browse data. Again, I was blown away by Gemini’s speed – millions of records on my laptop and a response to any query comes back in milliseconds. How do they do that? I still cannot believe I am getting this power from the 30MB add-in.
After that I hit “Save” button. It took about 40 seconds for Excel to save this new excel file. This file contains all of the data (!) and metadata from all tables and pivot definitions. The saved Excel file size was 72MB. Just to remind you - I loaded over 3.2GB SQL Server table into Gemini that compressed and saved that data in the 72MB Excel file. I probably got such a great compression because my table contained a lot of duplicate data. I noticed that saving Excel file after any changes with such amount of data always was on a slow side – it took about 20-40 seconds.
Opening the Excel file took just a few seconds and pivot table took few more seconds to return results from a changed query. Opening the Gemini add-in with 20mln rows saved took just a few seconds too.
Overall I am very impressed with Gemini and the way it handled my fact table with 20mln rows very well on my laptop.