Thomas Kejser and I made some significant updates to the SQL Server Analysis Services Performance Guide for SQL Server 2008 - it's available here.
Our hope is to keep it alive and make updates as necessary. So if there is something you'd like to see added or have any comments at all, shoot me a note at richtk@microsoft.com.
- Richard Tkachuk
This blog is a result of lessons learned while working with the new Integrated Full Text Search in SQL Server 2008. It is not intended to be an in-depth tutorial on how to implement it. Instead, it will concentrate on best practices. The basics of how the new full text search feature works is found in Books On Line.
If you would like a good introductory tutorial, start with SQL 2008 Books Online, then read this whitepaper: http://msdn.microsoft.com/en-us/library/cc721269(SQL.100).aspx. It is one of the best whitepapers I’ve seen recently on a product feature.
Also for more introductory lessons here is an excellent link: http://sqlblogcasts.com/blogs/simons/archive/2008/02/19/SQL-Server-2008---iFTS-Introduction.aspx
Simon Sabin also has a nice series of posts on the new iFTS features: http://www.sqlskills.com/blogs/simon/2008/02/20/SQLServer2008IFTSNewFeatures.aspx
The new iFTS feature is not rocket science, but you should expect to spend some time learning about the new vocabulary, like Fragments, Master Merge, etc. You will also spend some time learning the new DMVs (like sys.dm_fts_index_keywords and sys.dm_fts_index_keywords_by_document) .
Performance
Here is what we noticed about the performance of iFTS in a production environment.
The full text searches and queries are very fast if you can keep the full text indexes up to date. There are two choices – automatic or manual index updates. Most applications should be able to run with automatic tracking turned on. The higher volume transaction systems may start to experience some blocking on inserts and updates at times. The blocks are short and do not last more than a few milliseconds most of the time. However, if your transaction volume is high enough where the blocking starts to impact the user experience, then turn off automatic updates and run manual merges. A merge will take the word fragments and merge them with the master fragment.
To get the best performance from your full text index, do the following new iFTS Best Practices:
1. Put the full text index in its own filegroup. This will avoid fragmenting the main data file(s).
2. Use Varchar(max) instead of image or text for the fields that you want to use as the base for your full text index.
3. Turn off auto updates if you see blocking on inserts and updates and do manual merges.
4. Avoid running Master Merges and a rebuild/reorg on another index on the same table at the same time. Make sure a Master Merge job does not overlap with a job that rebuilds a non clustered index on the same table.
Another interesting thing we learned is that if you have to rebuild the entire full text index for any reason, the full text queries will still run. You will notice that they may be very fast at first. This is because a full rebuild will drop the index and all the fragments, so the queries will be running against an empty index at first. They will continue to run as the index is built. The good thing is that you should rarely, if ever, have to do a full rebuild of the index. Instead, any major work should be done with ALTER INDEX START FULL POPULATION. This not a perfect solution since it does UPDATE statements to the index, which does its normal exclusive lock. It does the update in chunks which means that the locks are only briefly held. However, you will experience more blocking than normal for the full text queries while the update is running.
iFilters
Different document types need iFilters installed. Most common document types have available iFilters. The one exception at the time of this writing is the 64 bit PDF filter. There is one available but it has bugs that cause aborts. Hopefully Adobe can deliver the fixes soon.
Limitations
As good as the performance is on the new iFTS feature, there are still some limitations that will cause concern for some projects. First, the table must have a single column primary key. Second, the SWITCH command for partitions does not work. Third, there are no good scale out solutions with iFTS. Multiple indexes cannot appear as one and iFTS does not work with distributed partitioned views. We are working on improving these limitations for the next version of SQL Server. If you need a higher volume solution, ask your local Microsoft contact about FAST.
One other item you will be interested in is that the full text index size in SQL 2008 is approximately 50% bigger than the full text indexes in SQL 2005. This is generally not a problem, just a fact that you should know.
Handling multiple languages in a single document is a hard problem. Which word breaker do you use to shred the original document, and which language are you going to specify for the query? For example, if you have a document with Korean and English and you use the Korean word breaker to process the document, then if you search the document for English words it will only find the exact words and not any other forms of the words (like ing and s).
Memory Settings
The new full text daemon (FDHost) runs as a windows program and is not part of the SQL Server process, which means that they will compete for memory. This is normally not a concern because SQL Server generally plays well with other processes running on the same server. However, we noticed that if you have very large documents, like greater than 100MB, or even hundreds of 2MB documents being parsed at the same time, then you may want to make some configuration changes.
There are two options in this case:
1. Sp_fulltext_Service can be used to set the block size. Try increasing the block size to 512k.
2. Use sp_configure to set the maximum memory for SQL Server down a bit. How much down will depend on how many documents you intend to process simultaneously.
Other notes about FDHost
· The Katmai FDhost runs outside of SQL server memory space. There are a lot of similarities between Yukon MSFTEFD and Katmai FDHost processes.
· The Filter daemon manager (runs within SQL process) manages the FDHost process and it will launch one FDHost process for single thread and one for multi-thread FDHost processing. These two do not necessarily start at the same time.
o Single or multithreaded fdhost depends entirely of the iFilter used: if the document filter supports multithreading, then we load it into a multithreaded ready fdhost, otherwise not.
o Named Pipes must be enabled on the SQL server box because Filter daemon manager uses named pipes to communicate with FDHost.
· FDHost processes are not shared across multiple Katmai iFTS instances on the same server.
· During the full population of the FT indexes, the FDHost process may need a lot of memory. It’s a good practice to:
o Schedule the full FT index population off business hours when the SQL server is not being used by other processes or queries.
o You may need to manually reduce the SQL Server max memory to give more memory to FDHost as noted above. However, if SQL Server is not being heavily used, FDHost will get the memory left on the box.
· To improve the performance between FDHost manager and FDHost, users can change some configuration settings (such as ISM – internal shared memory, batch size, etc.) to improve the consumption process of the chunks of data sent to the FDHost.
Miscellaneous notes:
· The New Database dialog box in Management Studio has an option grayed out. Just below the name and owner there is a grayed out check box. In the released version of SQL Server 2008 the full text options are on by default. This was left in place in case any customers had references to it in scripts.
· iFTS support for Danish, Polish and Turkish is not on by default. These word breakers are delivered with SQL Server. Since they are written by 3rd party companies it was decided to leave them off by default.
Upgrade
If you are upgrading from SQL 2005, then you have a decision to make on whether to just upgrade the full text index or start over and rebuild it. The best practice recommendation is that if there is a new word breaker available for your language, then you should completely rebuild the full text index after upgrading, if you want to use the new word breakers and stemmers. http://msdn.microsoft.com/en-us/library/cc721269(SQL.100).aspx has a list of which word breakers are exactly the same and which are new.
Kevin Cox and Xiaoyu Li
Recently, Wanda He from SQLCAT ISV team worked with Oracle Siebel group completed a new 12,000 concurrent users Siebel CRM 8.0 benchmark on SQL Server 2008 (http://www.microsoft.com/isv/oracle/).
The benchmark demonstrated that the combination of Microsoft Windows Server 2008, SQL Server 2008, Siebel CRM Release 8.0 architecture and HP BL460c/BL680c Servers is a powerful and cost effective CRM solution. You can read details on the benchmark test configuration and results in the whitepaper: http://www.oracle.com/apps_benchmark/doc/hp-siebel8-12000-pspp-on-windows-white-paper.pdf
Check out the the third of five technical note as part of the Building and Deploying Large Scale SQL Server Reporting Services Environments Technical Note Series: Reporting Services Scale-Out Deployment Best Practices
This technical note reviews the
- SSRS Scale-Out Architecture
- Report Catalog sizing
- The benefits of File System snapshots for SSRS 2005
- Why File System snapshots may not help for SSRS 2008
- Using Cache Execution
- Load Balancing your Network
- Isolate your workloads
- Report Data Performance Considerations
Enjoy!
Check out the new look and feel of the sqlcat.com! Now you have easy access and views of our Top 10 Lists, Technical Notes, Whitepapers, and Toolbox. As well, now our blogs our mirrored to sqlcat.com for easy access to all SQLCAT tips, best practices, and lessons learned.
Right now, you can see our graphic noting the "SQLCAT Track at PASS" where the SQLCAT team will have 15 sessions providing indepth technical learnings and best practices from some of the largest real-world SQL implementations.
So review our site to learn more ... and see you at PASS!
The other day we got a call from a puzzled friend. He has rewritten a stored procedure using table variable instead of temp table, it makes the stored procedure code look more tidy. The puzzling part is the same stored procedure now running a lot slower. The estimated plans looked the same using temp table and table variable, but the execution plans are very different. Unbeknownst to him, rewriting the procedure using table variable bumped into a query optimizer blind spot. While the “Estimated” and “Actual” query plans are same in most cases, there are some exception. If a table (temporary or permanent) is created in the same batch with the query, SQL Serve has to recompile the query since the query definition is unknown when the batch is compiled the first time. This is not the case for table variables (and that was the main reasons SQL Server 2005 introduce them – to reduce recompilations when unnecessary). In order to explain in detail, let’s use the following example:
--0. create test data
use test
if (OBJECT_ID('Test1') is not null)
DROP TABLE Test1
go
CREATE TABLE Test1 (ID int)
DECLARE @i int
SET @i = 0
SET NOCOUNT ON
WHILE @i < 20000
BEGIN
INSERT INTO Test1 (ID) Values (@i)
SET @i = @i + 1
END
CREATE CLUSTERED INDEX IX_Test1 ON dbo.Test1 (ID)
--1. Query using table variable
DECLARE @Tmp1 TABLE (ID int)
INSERT INTO @Tmp1(ID)
SELECT ID
FROM Test1
SELECT *
FROM Test1
WHERE ID NOT IN (SELECT ID FROM @Tmp1)
--2. Query using temp table
CREATE TABLE #Tmp1(ID int)
INSERT INTO #Tmp1(ID)
SELECT ID
FROM Test1
SELECT *
FROM Test1
WHERE ID NOT IN (SELECT ID FROM #Tmp1)
DROP TABLE #Tmp1
The estimated plans for both SELECT queries in the example above look the same – Nested Loops (Left Anti Semi Join) with the outer (top in the graphical display) table the Clustered index on Test1 and the inner table is the table variable in the first case and the temporary table in the second case. But if you run both scripts you will see that the first runs considerably slower – a minute or two depending on your processor speed, memory available to SQL Server and disk I/O capacity. How is this possible?
When the queries are compiled neither the table variable nor temporary table are populated and the query optimizer assumes “minimum” number of rows and that is always 1 in SQL Server. This estimation is used when SQL Server generates the “Estimated” query plan. While the “Estimated” and “Actual” query plans are same in most cases, there are some exception. If a table (temporary or permanent) is created in the same batch with the query, SQL Serve has to recompile the query since the query definition is unknown when the batch is compiled the first time. This is not the case for table variables.
Now turn on “Display Actual Execution Plan” in SSMS and re-run the scripts 1. in one query window and 2. in another one. You will see that the actual plan for script 1 is the same as estimated plan and took substantially longer than the script 2 to complete. The actual execution plan for Script 2 is very different from its estimated plan. The actual execution plan for script 2 uses Hash Match to execute the Anti Semi Join at the top of the plan. There is a table scan of the temporary table #Tmp1 at the top of the plan building the hash table. If you examine the properties of this scan you will find out that the number of Estimated rows is 20000 (was 1 in the Estimated plan as well as it is 1 for the table variable plan).
Figure 1. Estimated plan

Figure 2. Actual plan of SELECT query using temp table
![clip_image001[10]](http://blogs.msdn.com/blogfiles/sqlcat/WindowsLiveWriter/Tablevariablevs.temptabl.actualqueryplan_A74C/clip_image001%5B10%5D_thumb.jpg)
When examining query plans be careful to consider possible plan change during the batch execution due to recompiles. Bear in mind that while table variables are not causing recompiles, you may need just the opposite to get the best query plan. You should always use temporary tables in your queries unless you intentionally want to avoid query recompilation and you are confident query variable will give you a good plan. Introduction of statement level recompilation in SQL Server 2005 substantially lowered the need to use table variables in your queries.
You will find some great information about table variables and temporary tables in
http://databases.aspfaq.com/database/should-i-use-a-temp-table-or-a-table-variable.html .
Please note that the actual execution plans for the above batches will change if you merge 1 and 2 into to a single batch and will change again if you merge all three into single batch… There are some interesting learnings surrounding such modifications and we will talk about them in some later blog entries! Stay tuned!
Lubor Kollar, Lindsey Allen
After the great work from the ETL World Record (for more information, refer to our other blog at: http://blogs.msdn.com/sqlcat/archive/2008/09/18/scaling-heavy-network-traffic-with-windows.aspx), Thomas and I (with the help of many others) have created a "Top 10 list" of the best practices for SSIS that can be found on sqlcat.com: Top 10 SQL Server Integration Services Best Practices
As well, everyone knows the benefits of "money"...but not everyone may know the benefits of the money data type. So if you're up for a technical note that provides some deep details on the money data type and how it can help you improve performance, check out the "The Many Benefits of Money....Data Type!" technical note on sqlcat.com as well.
Enjoy!
Saw an amazing BI demo this morning at the BI Conference here in Seattle. Donald Farmer showed how over 20M rows of data can be modeled and analyzed in memory. To build a model today, a DBA needs to define dimensions and fact tables, get the relationships right, define calculations, deploy it to a server, build and manage it. After that, someone can connect to it and play with the data.
What Donald showed is how a user can do all that with an Excel add-in. He started with 20 million rows in memory on a no-frills PC and built a model from scratch. Even with 20 million rows, interactive ordering, filtering and windowing and pivoting was instantaneous. It’s difficult to compare how much simpler it is building a model with actual data than it is building one with abstractions and not seeing the result at the end. You might remember the days before WYSIWYG when documents were built with formatting and font codes and not seeing how it would look until it hit the printer. This is doing the same thing to data analysis with non-trivial amounts of data – WYSIWA (what-you-see-is-what-you-analyze).
And – this bears repeating - it’s all in Excel.
The next thing was striking as well and likely will have just as much an impact – being able to publish the model to SharePoint and other people being able to access it from a URL. So normal people (and not just data geeks like me) will be able to start with vast amounts of data, build a model (without even realizing it!), analyze the data, and post the result (with all the data) to share with others.
Released on the next version of SQL Server code named Kilimanjaro which is a focused release of new BI capabilities. It will be available sometime in the first half of calendar year 2010.
Coming up this November 18-21st is the SQL PASS Conference in Seattle, WA (right in Microsoft’s backyard). See http://summit2008.sqlpass.org/ for all the details. This SQL PASS will be very special. The SQL Server engineering team is sending up to 300 developers to the conference. You won’t have to walk very far in order to run into one. To meet the engineers 1:1 and discuss your SQL Server applications should be a big justification for attending.
The SQLCAT – SQL Server Customer Advisory Team will be there in full force for the entire four days. You won’t miss us as we will all have the same colored shirts with our SQLCAT logo. We will have a SQLCAT Track with up to 15 sessions focusing mostly on interesting learning’s from early deployments of SQL Server 2008. You can find details of the sessions on our website: SQLCAT Track at PASS. We want to hear from you. We want to learn about your SQL Server deployments, your successes and your challenges. You are more than welcome to walk up to a SQLCAT employee at anytime and say hello. There may even be a surprise or two if you do.
I look forward to seeing you there!
Under Windows Server 2000 and 2003 (RTM), the interrupts from a single network adaptor (NIC) cannot be handled by multiple CPU Cores. A Deferred Procedure Call (DPC) call gets scheduled to run as a consequence of the NIC firing an interrupt. The DPC will deliver the received packets from the NIC to the networking subsystem of the OS. The NIC driver will block interrupts from the network card until the DPC has been handled. If your system makes heavy use of network bandwidth, sending all interrupts and DPC to one CPU may overwhelm the system and cause a bottleneck. In extreme cases the system may even become unresponsive as the amount of DPC traffic consumes all CPU cycles.
One workaround, if you run Windows 2000 and 2003 Server, is to use the Interrupt Affinity Filter Tool. This tool, which applies only to x86 version of for Windows 2000/2003 is documented here: KB 252867. Using the Interrupt Affinity Filter Tool, you can affinitize different network adaptors to different CPU cores. This overcomes parts of the problem and increases scalability by distributing interrupts between CPU cores. Be aware that the Windows 2008 tool for network affinity is not the same tool as under Windows 2000/2003 Server. You can specify network affinity in Window 2008 by editing the registry directly or by using the new Interrupt-Affinity Policy Tool.
The Windows 2003 Scalable Networking Pack (SNP) improves network scaling further by introducing Receive Side Scaling (RSS). RSS is also included in both Windows 2003 SP2 and Windows 2008. This improvement includes the ability to concurrently and asynchronously handle DPCs from a single network adaptor. The improvements are documented here: Scalable Networking with RSS. Do be aware that your network card must support this feature to gain the full advantage of the implementation.
Even with RSS, high bandwidth workloads can still benefit from affinitizing the network cards to specific CPU cores. For example, using interrupt affinity and RSS was a crucial optimization for the ETL World Record. When scaling Integration Services to 64 cores, we dedicated every 8th core – one core and NIC for each NUMA node – to network traffic. Using once NIC per NUMA node keeps the memory local to the node. The rest of the cores we affinitized to SQL Server. At full bulk insert speed on the 64 cores Unisys ES/7000, our CPU load looked like this:
Notice the highlighted cores – these cores are dedicated to handling network traffic. The system had throughput of over 800 MB/sec at the point above.
Other uses of network affinity and RSS include: Analysis Services Process Data, Data Mart Loading, Heavy ETL activity and other scenarios where the network cards on the server are under heavy load.
/Thomas Kejser
Reviewers: Stuart Ozer, Carl Rabeler, Denny Lee, Prem Mehra, Ashit Gosalia, Ahmed Talat, Kun Cheng, Burzin Patel, Special Thanks to Rade Trimceski
Thomas Kejser and I are doing some work to produce the SQL Server 2008 Analysis Services Performance Guide. Among a bunch of other things, it discusses the IIF function. But because I recently mentioned MDX query hints in a recent podcast, I wanted to get this out quickly. Some references are a bit hazy (expensive vs inexpensive query plans, default values and the like) and I’ll elaborate either in future blogs or in the white paper itself.
Anyway, the IIF mdx function is a commonly used expression that can be very costly to evaluate. It takes 3 arguments:
iif(<condition>, <then branch>, <else branch>)
Where the condition evaluates to true, the value from the “then branch” is used otherwise the “else branch” expression is used.
Note the term “used” – one or both branches may be evaluated even if its value is not used. It may be cheaper for the engine to evaluate the expression over the entire space and use it when needed (termed an eager plan) rather than chop up the space into a potentially enormous number of fragments and evaluate only where needed (a strict plan).
The first consideration is whether the query plan is expensive or inexpensive. An expensive query plan is one that is evaluated in cell-by-cell mode (more on this in a future post) or one that requires reading a cube data (most commonly because a member navigation function is used). Most IIF condition query plans are inexpensive – but nasty nested conditions with more IIFs can force it to go to cell-by-cell (functions that are not cell by cell are listed in the sql documentation).
The next consideration the engine makes is what value the condition takes most. This is driven by the condition’s default value. If the condition’s default value is true, then the “then branch” is the default branch – the branch that is evaluated over most of the subspace. Simple investigation and knowing a few simple rules on how the condition is evaluated helps to determine the default branch:
· In sparse expressions most cells are empty. So the default value of the isempty function on a sparse expression is true.
· Comparison to zero of a sparse expression is true
· Default value of IS operator is false
· If the condition cannot be evaluated in subspace mode, there is no default branch
For example, one of the most common uses of the IIF function is to check whether the denominator is non-zero:
iif([Measures].[Internet Sales Amount]=0, null, [Measures].[Internet Order Quantity]/[Measures].[Internet Sales Amount])
There is no calculation on Internet Sales Amount so it is sparse. Therefore the default value of the condition is true and therefore the default branch is the then branch with the null expression.
The table below shows how each branch of an IIF function is evaluated:
|
Branch Query Plan |
Branch is default branch |
Branch expression sparsity |
Evaluation |
|
Expensive |
n/a |
n/a |
Strict |
|
Inexpensive |
Unknown |
n/a |
Eager |
|
Inexpensive |
True |
n/a |
Eager |
|
Inexpensive |
False |
Dense |
Strict |
|
Inexpensive |
False |
Sparse |
Eager |
In SQL Server 2008 Analysis Services, you can overrule the default behavior with query hints:
Iif( <condition>, <then branch> [hint [Eager|Strict]], <else branch> [hint [Eager | Strict]] )
When would you want to override the default behavior? The most common scenarios where you might want to change the default behavior are:
· Engine determines the query plan for the condition is expensive and evaluates each branch in strict mode
· Condition is evaluated in cell by cell mode and each branch is evaluated in eager mode
· Branch expression is dense but easily evaluated.
For example, consider the simple expression below taking the inverse of a measure:
with member
measures.x as
iif(
[Measures].[Internet Sales Amount]=0
, null
, (1/[Measures].[Internet Sales Amount]))
select {[Measures].x} on 0,
[Customer].[Customer Geography].[Country].members on 1
from [Adventure Works]
cell properties value
The query plan is not expensive, the else branch is not the default branch and the expression is dense, so it is evaluated in strict mode. This forces the engine to materialize the space over which it is evaluated. (This can be seen in profiler with query subcube verbose events selected – but it’s a time consuming investigating each event’s subspace definition).
To prevent the query plan from partitioning the space, the query can be modified as follows:
with member
measures.x as
iif(
[Measures].[Internet Sales Amount]=0
, null
, (1/[Measures].[Internet Sales Amount]) hint eager)
select {[Measures].x} on 0,
[Customer].[Customer Geography].[Country].members on 1
from [Adventure Works]
cell properties value
OVERVIEW: One of the common complaints in using database snapshots is how to get queries and reports to switch over and start using the new snapshot as soon as it is available. There are three basic ways to accomplish this:
1. Delete the old and rename the new database snapshot. This works well if no users are using either snapshot.
2. Changing the connection string. This works pretty well and has the advantage that long running reports can continue to run while new reports/queries get directed to the new database snapshot. But doesn’t work well if you have many places to change the connection string or have to deploy a new application version.
3. Use Synonyms in for the base objects to point to a database snapshot. That’s what this blog is all about.
Steps to take to implement synonyms with database snapshots:
1. Create db snapshot
2. Create synonyms in main db that point to snapshot objects
3. Sometime in the future, create another snapshot
4. Update the synonyms in a DDL transaction. Warning: Blocking can occur so read the blocking section below.
Important Notes:
· You may get some application timeouts if synonym updates get blocked for too long.
· READ_UNCOMMITTED could be problem if you implanted scenario where a query would join one table from one snapshot to a table in a different snapshot.
· Could combine synonyms with shared scalable database. This is a scenario where you could use multiple servers to attach to SAN snapshots and then use synonyms to point to the current SAN snapshot.
Potential Blocking Scenario
The DDL transaction to update all the synonyms to point to another snapshot will update system tables. This transaction will be blocked by any readers accessing the synonym table, which can cause problems. And it will block any new queries from accessing the synonym table while the transaction is running. It could mean that the application can experience timeouts if it is blocked for too long.
Trying to work around the blocking by creating the reader sessions with READ_UNCOMMITTED isolation level would probably work but is not recommended. I can imagine a scenario where a query can access a table in the new snapshot and join to a table in the old snapshot, which is the main reason why the synonyms are updated in a transaction. So the blocking is actually a good thing as long as it doesn’t cause excessive application time outs.
The DDL transaction runs quickly. On my laptop, I updated 300 synonyms in about 1 second. There are certain ERP applications out there with 10,000+ tables so this synonym idea would be a bit more painful to implement.
One clarification: Queries that are already running will continue to run. New ones are only blocked from resolving synonyms. Once they have resolved the synonym name and start reading the tables then there will not be blocking.
Sample Code:
To use this technique, we need to look at the original, or source database, three different ways; 1) the actual source database, which we’ll call SourceDB, 2) a snapshot of the source database, which we’ll call SnapshotDB, and 3) the reporting database, which we’ll call the ReportDB. The SnapshotDB represents a point-in-time version of SourceDB, and is created using SQL similar to the following:
USE SourceDB
CREATE DATABASE SourceDB_<timestamp>_Snapshot ON
( NAME = SourceDB, FILENAME =
'C:\SourceDB\SourceDB_<timestamp>_Snapshot.ss' )
AS SNAPSHOT OF SourceDB;
GO
The ReportDB contains no source data, only synonyms. The synonyms point to the relevant tables and views in SnapshotDB, and are created using SQL similar to the following:
USE ReportDB
CREATE SYNONYM dbo.TableName FOR
SourceDB_<timestamp>_Snapshot.dbo.TableName
With this basic setup, all report-type queries can be executed against the ReportDB, using the same schema and object names defined in SourceDB, and the returned data will be datetime-consistent within the point-in-time source database snapshot, SnapshotDB. Additional objects, such as views, stored procedures, etc., can be created in ReportDB that refer to the synonyms, and will behave as expected.
In most cases, it will be desired to periodically drop and re-create SnapshotDB to have a more current point-in-time version of SourceDB. However, to maintain 24/7 application access while not requiring the application to use different connection strings, we cannot simply drop SnapshotDB, as queries against the synonyms in ReportDB would immediately fail, and dropping any database requires there are no open connections to it. We therefore must create a second, more current SnapshotDB, transactionally drop and re-create all the synonyms updated to point to the new SnapshotDB, then drop the old SnapshotDB. There may be queries still running against the first snapshot so you may want to comment out the DROP DATABASE command and have a separate job to do it at a later time. Beware that the more database snapshots you have active the more of a performance hit you will notice. See the whitepaper at the end.
The attached store procedure, usp_ReportingSynoSnap, implements this logic, using the appropriate metadata from SourceDB. It will create any schemas in ReportDB that exist in SourceDB, creating the corresponding synonyms in the correct schema. If access to SnapshotDB is only through synonyms in ReportDB, then there will effectively by no open connections to SnapshotDB, only to ReportDB. If those connections are active, executing queries against the synonyms, they will block the stored procedure from completing until the blocking queries finish resolving the synonym names, but once they have the stored procedure will drop the old SnapshotDB.
The stored procedure assumes the ReportDB already exists, and will only contain synonyms pointing to SnapshotDB, as it always drops all synonyms to ensure there are no orphans. The template used for naming SnapshotDB is <SourceDB>_Rpt_YYYYMMDD_HHMMSS_ss, where YYYYMMDD_HHMMSS is replaced with an actual timestamp of when the snapshot was created. Creating snapshots require enumerating all the logical data files for SourceDB, declaring a corresponding snapshot physical file for each. The stored procedure creates a snapshot physical file in the same location as the source physical file, named identically except for a suffix with the following format .Rpt_YYYMMDD_HHMMSS_ss (ex; AdventureWorks.mdf.Rpt_20080101_010101_ss), where YYYYMMDD_HHMMSS will be identical to the timestamp within the SnapshotDB name. Lastly, the stored procedure will drop the old SnapshotDB, determining the name of the old snapshot by finding a snapshot database with the same name as the new snapshot, but with the highest (in sort order) YYYYMMDD_HHMMSS. To use the stored procedure, create in either the SourceDB or ReportDB, and call with “EXEC usp_ReportingSynoSnap ‘SourceDbName’, ‘ReportDbName’”; ex: “EXEC usp_ReportingSynoSnap ‘AdventureWorks’, ‘AdventureWorks_Reporting’” (AdventureWorks_Reporting must exist before calling). If you need to keep the old snapshot around for a while, just remove or comment out the DROP DATABASE command near the end of the script.
The HHMMSS suffix in the example above may be misleading. It may take seconds or minutes depending on what is happening in the source database. And the data in the snapshot is not current as of the start or the end of the CREATE DATABASE command. All of this is explained in the whitepaper links at the end.
It should be noted that Windows will report the logical snapshot file size as identical to the corresponding source file size, however as the snapshot file is actually a sparse file, a feature of NTFS, the actual size consumed on disk will initially be zero, and will grow incrementally by 64k page sizes as data is changed in SourceDB, and SQL saves the original (before the changes are applied) SourceDB page to the snapshot (note this only happens the first time data is changed on the source page). This has two consequences, one is that there is a normally a slight write performance hit when using snapshots, and second, that if snapshots are not dropped and lots of new changes happen in the source, you may eventually end up consuming near twice the physical space actually needed for the source. Beware that there are some performance consequences that you need to be aware of when running snapshots. See the whitepapers below for more detail.
Copy the following script and run it in a query window to create the stored procedure in one of your application databases.
SOURCE CODE
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[usp_ReportingSynoSnap]
@sourceDbName NVARCHAR( MAX )
,@reportDbName NVARCHAR( MAX )
,@scriptOnly BIT = 0
AS
BEGIN
DECLARE
@sqlStmt NVARCHAR( MAX )
,@dtStamp NVARCHAR( MAX )
,@prevRptSsName NVARCHAR( MAX )
SET @dtStamp = REPLACE( REPLACE( REPLACE( CONVERT( NVARCHAR, GETDATE(), 120 ), '-', '' ), ' ', '_' ), ':', '' )
SELECT @prevRptSsName = MAX( ssdb.name )
FROM sys.databases ssdb
JOIN sys.databases db
ON db.database_id = ssdb.source_database_id
WHERE db.name = @sourceDbName
AND &