Heap Fragmentation

Current Scenario :  

  • All the tables have only non-clustered indexes ( HEAPS ) .
  • All these non-clustered indexes are heavily fragmented > 98 %.  i.e We have to deal with fragmentation on HEAPS
  • Couple of non-clustered indexes has a forwarding record count which is  not good.
  • These Heap tables , over time become highly fragmented and degrade query performance. 

Maintenance activity planned for today will :

We will defragment a HEAP :

  • Rebuild the non-clustered index on tables which have forwarding record count
  • Rebuild non clustered indexes on rest of tables which will help reduce the fragmentation

This should help in normal scenarios but issue we have is fragmentation on HEAPS , the only other option available is to create a clustered index on the all tables to remove fragmentation.

However by far we should always  to avoid heaps altogether in your database design.

 

Read Good Blog on heap fragmentation : http://www.patrickkeisler.com/2013/03/dealing-with-fragmented-heap.html

 

Troubleshooting SSIS OLEDB Connection Error

SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER.  The AcquireConnection method call to the connection manager failed with error code 0xC0202009.  There may be error messages posted before this with more information on why the AcquireConnection method call failed.

 

In case you face the above error

1. Ensure Delay Validation is set  to True on all DFTs, Execute SQL Task & Package properties.

2. Ensure the project and Package Protection Level is set to the default – EncryptSensitiveWithUserKey

3. Run 64 bit RunTime under Project right click -> Configuration properties is set to False

 

Ciao

Annie

Back up the transaction log 2000

When you see this error message

The log file for database is full. Back up the transaction log for the database to free up some log space.

backup log DATABASE_NAME with truncate_only
go
dbcc shrinkfile (DATABASE_NAME_log,0)
go

For the optimized data experience

Some very good pointers from  : http://www.daveturpin.com/2012/10/10-tips-to-optimize-data-warehouse-reporting/

Few excepts from this post which caught my attention were:

  1. “Use the fewest number of threads possible when loading data.  If you set the MAXDOP (max degree of parallelism) to 1 the incoming data will be written to contiguous chunks of disk space.  Data warehouse reporting is typically grabbing lots of related records, usually over a time range.  This results in pulling records that are all loaded at the same time.  This is known as a RANGE SCAN.  Range scans work best when the data is stored on contiguous blocks of disk.  It’s really that simple…  By using a single thread to lay the data down on the disk, we are sacrificing load performance for improved reporting performance.  That’s OK, you only load the data once and hopefully no people are involved in the load process.  The data will be read many times over by people waiting for reports to come out of the system.  A slower load time is a fair trade for a faster user experience any day of the week.
  2. Maintain the highest quality statistics on your data as possible.  If your tables tend to grow incrementally over time, the auto statistics feature of SQL Server will work fine, for a while.  At some point the low sample rate of the auto stats will result in inaccurate statistics.  For that reason it is absolutely imperative that a rock solid statistics maintenance plan be put in place.  And don’t be fooled into thinking that the stock “maintenance plan” wizard in SQL Server will suffice.  What happens if a statistic becomes severely out of date?  The query optimizer will generate a bad plan for the SQL query and the query will appear “hung”.  The consumer of the data will think the server is busy or that you have a bad design, when in fact you just have outdated statistics.
  3. Give SQL Server as much memory as you can afford.  Whatever you give it, it will use.  If your server is a dedicated database  server, turn on Lock pages in memory.  This will prevent SQL Server from ever giving up memory after it has allocated it.  If your server is used for other applications however, locking memory pages may end up starving those applications of memory, which may end up making the database look slow.
  4. On a related note, separate services when ever possible.  SQL Server Integration Services (SSIS), like the SQL Server engine, will grab as much memory as is available.  If you just gave all of the system memory to SQL Server (minus a few Gig so you don’t starve the OS) what will be left for your ETL/loads?  Put SSIS on a separate server.
  5. If you have tables that are loaded once a month and then used for reporting purposes
  6. Do your housecleaning…  Get rid of indexes that are no longer needed.  Add new indexes sparingly and with great caution.  But do so when it makes sense.  I’ve never seen a perfectly indexed database in the initial design.  Drop “temp”/”test” tables.  Archive old data.  Better yet, deploy an automated archiving process.  There’s no point in keeping 10 years’ worth of data when your end users only need the current year plus last year.
  7. And last but not least, my personal favorite, NEVER SHRINK a database or database file.  If you do, you just fragmented your data and indexes, probably FOREVER.  The only way to get rid of that fragmentation will be to move the data to a new file group that isn’t fragmented.  The catch 22… if you just did a shrink on your database it’s probably because you were running low on disk space.  If you don’t have disk space to auto-grow your existing data files, you probably don’t have the disk space to move the data to a new file group.  The data will remain fragmented, FOREVER.”

From :http://www.daveturpin.com/

 

Things 2 Do after huge bulk inserts

 

Always refresh the statistics after a large insert

UPDATE STATISTICS my_table

We could also use DBCC DBREINDEX. DBCC DBREINDEX rebuilds an index for a table or all indexes defined for a table. DBCC DBREINDEX can rebuild all the indexes for a table in one statement.  This is easier than coding multiple DROP INDEX and CREATE INDEX statements.

With large bulk inserts the statistics get all mangled up and need a refreshed afterwards,  it’s also much faster than running a REINDEX or index REORDER.

Another option is to look into padding the index, you likely have no padding fill factor on your indexes meaning that if your index is:

A, B, D, E, F

and you insert a value with a CardNumber of C, then your index will look like:

A, B, D, E, F, C

and hence be ~20% fragmented, if you instead specify a fill factor for your index of say 15% we would see it look like roughly:

A, B, D, _, E, F

Take the index offline before a mass insert and then bring it back online after the mass insertions. It is much faster as compared to re-indexing, or performing a drop and create index. I have had a 3.5 million row insertion process and had a problem with one of my non clustered indexes fragmenting which was causing poor performance. The table had fragmentation up to 97% slowing down performance.

 

 

ALTER INDEX index_name ON dbo.table_name DISABLE

 

After the bulk inserts rebuild indexes back

 

ALTER INDEX ALL ON dbo.table_name REBUILD

 

Query Parallelism

Parallelism is very useful when you have a small number of very long running queries. Each query executes faster in parallel and since there are few queries at any given point in time there is likely little blocking or competition for resources.

When you start mixing the types of queries it is harder for the optimizer to choose the right parallelism levels for the overall workload. One slow query may start at a time when things are relatively idle and it may use many threads. Then several small, fast queries come in while the first is still running and the CXPacket waits (number and duration) start to increase making all of the queries go slower. SQL’s CPU usage may go up significantly.

When a query is compiled it is broken down into steps/operations internally. When the query runs, each step of the query plan that is eligible to go parallel can be optimized as using 1 or more schedulers. So if the plan has 10 steps, maybe step 1 gets 1 scheduler, step 2 gets 5 schedulers, step 3 gets 8 schedulers etc.

 

Another problem can arise when the statistics are out of date or very skewed. That means that the optimizer may choose a plan based on the expected number of rows (cardinality estimate) that doesn’t quite balance out as expected because the actual number of rows is so far off. In that case updating statistics (perhaps WITH FULLSCAN if the data is skewed) may help.

 

Do I have a parallelism problem?

Next there is the question of how to know if the current “max degree of parallelism” is contributing substantially to a current performance or resource constraint problem. If your Process.SQLServr.%Processor Time/# of logical processors value is much higher than normal/expected AND wait stats shows more than 5% of your waits are on CXPackets, you may want to test lower (or non-zero) values of “max degree of parallelism”. You find connections that have currently running parallel queries by looking for SPIDs with multiple ECIDs in sys.sysprocesses. Query plans that are eligible for parallelism will have the Parallelism operator. If you decide your current degree of parallelism might be negatively impacting performance, use the information above to estimate what may be a good starting point for your particular instance and test/baseline/tune from there. No restart is required after you change the value; new queries automatically use the new value (unless overridden by a hint in the query)

 

Originally from this blog here  : http://blogs.msdn.com/b/cindygross/archive/2011/01/28/the-ins-and-outs-of-maxdop.aspx

 

MTVF’s vs ITVF’s

Here is enough reason why Inline Table Valued function is better than a Multi statement Table Valued function and why MTVFs should never be used.

I was so close to committing the biggest blunder in my SQL life.

http://www.sqlservercentral.com/blogs/discussionofsqlserver/2012/02/15/comparing-inline-and-multistatement-table-valued-functions/