Heap Fragmentation

Current Scenario :  

  • All the tables have only non-clustered indexes ( HEAPS ) .
  • All these non-clustered indexes are heavily fragmented > 98 %.  i.e We have to deal with fragmentation on HEAPS
  • Couple of non-clustered indexes has a forwarding record count which is  not good.
  • These Heap tables , over time become highly fragmented and degrade query performance. 

Maintenance activity planned for today will :

We will defragment a HEAP :

  • Rebuild the non-clustered index on tables which have forwarding record count
  • Rebuild non clustered indexes on rest of tables which will help reduce the fragmentation

This should help in normal scenarios but issue we have is fragmentation on HEAPS , the only other option available is to create a clustered index on the all tables to remove fragmentation.

However by far we should always  to avoid heaps altogether in your database design.


Read Good Blog on heap fragmentation : http://www.patrickkeisler.com/2013/03/dealing-with-fragmented-heap.html


Troubleshooting SSIS OLEDB Connection Error

SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER.  The AcquireConnection method call to the connection manager failed with error code 0xC0202009.  There may be error messages posted before this with more information on why the AcquireConnection method call failed.


In case you face the above error

1. Ensure Delay Validation is set  to True on all DFTs, Execute SQL Task & Package properties.

2. Ensure the project and Package Protection Level is set to the default – EncryptSensitiveWithUserKey

3. Run 64 bit RunTime under Project right click -> Configuration properties is set to False




Back up the transaction log 2000

When you see this error message

The log file for database is full. Back up the transaction log for the database to free up some log space.

backup log DATABASE_NAME with truncate_only
dbcc shrinkfile (DATABASE_NAME_log,0)

For the optimized data experience

Some very good pointers from  : http://www.daveturpin.com/2012/10/10-tips-to-optimize-data-warehouse-reporting/

Few excepts from this post which caught my attention were:

  1. “Use the fewest number of threads possible when loading data.  If you set the MAXDOP (max degree of parallelism) to 1 the incoming data will be written to contiguous chunks of disk space.  Data warehouse reporting is typically grabbing lots of related records, usually over a time range.  This results in pulling records that are all loaded at the same time.  This is known as a RANGE SCAN.  Range scans work best when the data is stored on contiguous blocks of disk.  It’s really that simple…  By using a single thread to lay the data down on the disk, we are sacrificing load performance for improved reporting performance.  That’s OK, you only load the data once and hopefully no people are involved in the load process.  The data will be read many times over by people waiting for reports to come out of the system.  A slower load time is a fair trade for a faster user experience any day of the week.
  2. Maintain the highest quality statistics on your data as possible.  If your tables tend to grow incrementally over time, the auto statistics feature of SQL Server will work fine, for a while.  At some point the low sample rate of the auto stats will result in inaccurate statistics.  For that reason it is absolutely imperative that a rock solid statistics maintenance plan be put in place.  And don’t be fooled into thinking that the stock “maintenance plan” wizard in SQL Server will suffice.  What happens if a statistic becomes severely out of date?  The query optimizer will generate a bad plan for the SQL query and the query will appear “hung”.  The consumer of the data will think the server is busy or that you have a bad design, when in fact you just have outdated statistics.
  3. Give SQL Server as much memory as you can afford.  Whatever you give it, it will use.  If your server is a dedicated database  server, turn on Lock pages in memory.  This will prevent SQL Server from ever giving up memory after it has allocated it.  If your server is used for other applications however, locking memory pages may end up starving those applications of memory, which may end up making the database look slow.
  4. On a related note, separate services when ever possible.  SQL Server Integration Services (SSIS), like the SQL Server engine, will grab as much memory as is available.  If you just gave all of the system memory to SQL Server (minus a few Gig so you don’t starve the OS) what will be left for your ETL/loads?  Put SSIS on a separate server.
  5. If you have tables that are loaded once a month and then used for reporting purposes
  6. Do your housecleaning…  Get rid of indexes that are no longer needed.  Add new indexes sparingly and with great caution.  But do so when it makes sense.  I’ve never seen a perfectly indexed database in the initial design.  Drop “temp”/”test” tables.  Archive old data.  Better yet, deploy an automated archiving process.  There’s no point in keeping 10 years’ worth of data when your end users only need the current year plus last year.
  7. And last but not least, my personal favorite, NEVER SHRINK a database or database file.  If you do, you just fragmented your data and indexes, probably FOREVER.  The only way to get rid of that fragmentation will be to move the data to a new file group that isn’t fragmented.  The catch 22… if you just did a shrink on your database it’s probably because you were running low on disk space.  If you don’t have disk space to auto-grow your existing data files, you probably don’t have the disk space to move the data to a new file group.  The data will remain fragmented, FOREVER.”

From :http://www.daveturpin.com/


Things 2 Do after huge bulk inserts


Always refresh the statistics after a large insert


We could also use DBCC DBREINDEX. DBCC DBREINDEX rebuilds an index for a table or all indexes defined for a table. DBCC DBREINDEX can rebuild all the indexes for a table in one statement.  This is easier than coding multiple DROP INDEX and CREATE INDEX statements.

With large bulk inserts the statistics get all mangled up and need a refreshed afterwards,  it’s also much faster than running a REINDEX or index REORDER.

Another option is to look into padding the index, you likely have no padding fill factor on your indexes meaning that if your index is:

A, B, D, E, F

and you insert a value with a CardNumber of C, then your index will look like:

A, B, D, E, F, C

and hence be ~20% fragmented, if you instead specify a fill factor for your index of say 15% we would see it look like roughly:

A, B, D, _, E, F

Take the index offline before a mass insert and then bring it back online after the mass insertions. It is much faster as compared to re-indexing, or performing a drop and create index. I have had a 3.5 million row insertion process and had a problem with one of my non clustered indexes fragmenting which was causing poor performance. The table had fragmentation up to 97% slowing down performance.



ALTER INDEX index_name ON dbo.table_name DISABLE


After the bulk inserts rebuild indexes back




Query Parallelism

Parallelism is very useful when you have a small number of very long running queries. Each query executes faster in parallel and since there are few queries at any given point in time there is likely little blocking or competition for resources.

When you start mixing the types of queries it is harder for the optimizer to choose the right parallelism levels for the overall workload. One slow query may start at a time when things are relatively idle and it may use many threads. Then several small, fast queries come in while the first is still running and the CXPacket waits (number and duration) start to increase making all of the queries go slower. SQL’s CPU usage may go up significantly.

When a query is compiled it is broken down into steps/operations internally. When the query runs, each step of the query plan that is eligible to go parallel can be optimized as using 1 or more schedulers. So if the plan has 10 steps, maybe step 1 gets 1 scheduler, step 2 gets 5 schedulers, step 3 gets 8 schedulers etc.


Another problem can arise when the statistics are out of date or very skewed. That means that the optimizer may choose a plan based on the expected number of rows (cardinality estimate) that doesn’t quite balance out as expected because the actual number of rows is so far off. In that case updating statistics (perhaps WITH FULLSCAN if the data is skewed) may help.


Do I have a parallelism problem?

Next there is the question of how to know if the current “max degree of parallelism” is contributing substantially to a current performance or resource constraint problem. If your Process.SQLServr.%Processor Time/# of logical processors value is much higher than normal/expected AND wait stats shows more than 5% of your waits are on CXPackets, you may want to test lower (or non-zero) values of “max degree of parallelism”. You find connections that have currently running parallel queries by looking for SPIDs with multiple ECIDs in sys.sysprocesses. Query plans that are eligible for parallelism will have the Parallelism operator. If you decide your current degree of parallelism might be negatively impacting performance, use the information above to estimate what may be a good starting point for your particular instance and test/baseline/tune from there. No restart is required after you change the value; new queries automatically use the new value (unless overridden by a hint in the query)


Originally from this blog here  : http://blogs.msdn.com/b/cindygross/archive/2011/01/28/the-ins-and-outs-of-maxdop.aspx


MTVF’s vs ITVF’s

Here is enough reason why Inline Table Valued function is better than a Multi statement Table Valued function and why MTVFs should never be used.

I was so close to committing the biggest blunder in my SQL life.


SQL Remember Me’s

  • Inbuilt stored procedure to retrieve list of indexes on a particular table

sp_helpindex tablename

  • Inbuilt stored procedure to retrieve list of primary keys on a table

sp_pkeys tablename

  • Inbuilt stored procedure to retrieve list of forgien keys on a table

sp_fkeys tablename

And a few do you knows..??

  • SQL Server 2005 allows 250 indexes per table i.e 1 Clustered index + 249 Non Clustered Index
  • SQL Server 2008 allows 1000 indexes per table i.e 1 Clustered index + 999 Non Clustered Indexes.

Of running 32 bit SSIS Package on 64 bit & Package Configurations

I struggled with finding out resolutions to few issues at work today that I had to blog about this right away if it would help save someone else precious development time.

I was recently working on a package which would connect to an FTP server, get the file listing and then download files from the FTP site into a local folder. This package worked perfectly fine on a WinXP machine but when I deployed it onto a Win 2003 server, and tried to execute the package using SQL Agent Job it would fail at the script task, logging “Unable to connect to FTP Server”. I was at wits end trying to figure out what could be denying the FTP connectivity through SQL Agent Job maybe some firewall settings, inadequate permission to the SQL Agent services account probably. The package would execute through Integration services and BIDS but fail when invoked using SQL Agent Job. Strange.

I tried numerous options like creating credentials and a new proxy login and executing the SQL Agent Job Run as the proxy user, changing the encryption settings of the package to use EncryptSensitiveWithPassword and then provided the password in the SQL Agent Job. But nothing worked.

Finally I enabled the ‘Use 32 bit runtime’’ check box under Execution options tab in both the steps in SQL Agent job and Presto!! the package worked like a charm.

There is also something new in package configurations in SQL 2008 which differs from SQL 2005. In SQL 2008 the package buy default uses the design time configuration settings even if you specify the different configuration files under the configurations tab in SQL Agent Job. So in order for the package to point to different configuration files at run time ensure that you disable package configurations in BIDS, rebuild the solution and redeploy the package on to the server. Now the package on SQL job can be directed to read from different configuration files.

Unzip using Execute Process Task in SSIS

Consider that you have a set of XML based files delivered on an FTP location. You have to download them and then unzip these into a folder before the processing begins
This can be accomplished using Execute Process Task. Though the task was really menial, I had spent a long time surfing to figure out how exactly to go about on this. So here is something to help you out.

I used unzip.exe from here placed them on my folder location. Provided the relevant inputs on as below on SSIS Execute Process Task Editor.
Arguments were given as D:\DataFeeds\Feeds\*.zip -d D:\DataFeeds\Application\
This means all the zip files from above location will be unzipped to destination folder (-d). You can try out the various switches to derive the results.

Unzip using Execute Process Task

unzip using Execute Process Task