Tuesday, November 10, 2009

PASS Summit 2009 Pre-Con

This was a very good day. Kimberly has an insane amount of energy and enthusiasm for indexes and database internals. Paul didn't say very much, but was responding to questions and his knowledge is astounding. This made me happy to be a DBA.

Indexing for Performance

So we started off with a lot of quality time discussing Heaps vs. Clustered tables. There was a clear bias for Clustered tables, but that's ok. One of the reasons I went to PASS was for Opinions not just raw facts.

Heaps are nice because there are no page splits.
Heaps are not nice because they use a forwarding pointer when the row size changes.
This means that the RID may go to a leaf or require an extra jump to reach the leaf.

Why would they use this goofy forwaring pointer? Well the RID is not an incrementing value, but is tied to some internal structures. The indexes all point at the RID. If you update the RID to indicate the new page the data can be found on then any UPDATE statements have to wait for the index(es) to be updated as well. By using the pointer, the indexes can stay pointed at the RID. This was a case where the "The Answer is another layer of abstraction!". Unfortunately as you get more forwarding pointers it requires more reads to retrieve your data. Over time you will pay a price for this.

To check your forwarding pointer count look at dm_db_index_physical_stats

Clustered Indexes
Updates are faster but may cause page splits. This may not be an issue because the page split happens once and thenthere is room for a lot of additional data on the page.

The Page Splits can lead to fragmentation. The best way to reduce the page splits is to adjust your fillfactor down (~70%). Sure, your table will get a bit bigger at first but it is space you will likely need going forward regardless.

Ola Hallengren has some excellent maintenance scripts. Investigate this.

It's important for your clustered index to be minimal because the columns get added to each NonClustered index.

If your clustered index is not unique a "uniquifier" (I'm not kidding) is added to each set of values that exist more than once. This value is a 4 byte int. If you exclude 2.1 billion dupes things may break (but you likely have other problems by then).

Splits do not impact your NonClustered indexes and do not create separate entries in the transaction log because they are done as part of the UPDATE / INSERT transaction which created them.

Index lookups always reach down to the leaf level, this is required for consistency.

SQL Server uses B+Trees for it's indexing method.

COUNT(*) works be doing a leaf-level scan of the smallest (minimal width) non-clustered index.

Trees live in the SQL Server cache and tend to be a large part of the logical reads.

The Optimizer's goal is not to find the best query plan, but to find an acceptable query plan swiftly.

The Optimizer expects the tree to be in the Cache and leafs to be on disk.

When does the optimizer decide to use a NonClustered Seek with bookmark lookup vs a Clustered index Scan?
It turns out this is tied to the page width.
When the # of rows returned <=  1/3 to 1/4 of the # of ALL pages (A table Scan) then the NonClustered index is used.
Otherwise the number of lookups of are cost prohibitive and it would be faster to use the Clustered index and filter.

When performance drops off suddenly it is nearly always Statistic related.
When performance drops slowly over time, it is nearly always Fragmentation related.

Stats used to auto-update when 20% of the rows on the table had been touched. (SQL 2000). Can be found in sysindexes.rowmodctr.
Now Stats will change on a column in a table only when 20% of that COLUMN changes. This is irritating.

DBCC SHOW_STATISTICS has some good stuff, including:
*Last updated
sp_autostats are interesting too.

They only record 200 entries in the histogram. They use Step Compression to show outliers. This means that similar values are lumped together so the outliers are more prominent.

_WA is used because MS is in WA.

sp_recompile works at the table level!

OPTION RECOMPILE is usefull to optimizing part of a query.

FULLSCAN will go parallel, SAMPLE will not. A sample of > 25% is slower than a full scan normally.

Filtered stats are pretty awesome.

check out KB328551

AND is progressively limiting while OR is progressively inclusive.

To improve join performance
1. make sure the FKs are indexes
2. cover the join
3. cover the query

To improve aggregate performancehaving the group by in the index is important and you should INCLUDE any columns you want to aggregate.

DTA can useful for straight SQL but it is terrible with procs and views.

Kimberly makes heavy use of Solutions and SQLCMD. Should we consider these for the release foilders?

Friday, November 6, 2009

PASS Summit 2009 Day 3

Starting to wear down by the third day. The conference was a bit smaller today, I suspect the local folks only hit the first day or two. That makes sense because MS and the Vendors are only here the first two days as well.

Day 3:
Lab Time
I returned to the to the lab and did a few more sessions, unfortunately these all had problems. I got frustrated after a bit and checked in with work until the first session.

Analyzing File and Wait Stats (Andrew Kelly)
It's one thing to know that DMVs (Dynamic Management Views) exist, it's entirely something else to interpret them. That's what got covered in this session.

File Stats (sys.dm_io_virtual_file_stats)
These are PHYSICAL IO ONLY. Logical IO is excluded because it never hits the files, only the cache.
Backups will skew your results pretty heavily. If you are monitoring the Reads you will see a big jump during the backup time window. Take stats before and after backups so you can exclude these counts.

The Writelog waits will let you know if you need to move your Log file off to separate or faster disks.

TempDB will have more IO than you expect. Monitor it to determine if you need to add more files to the file group (wish they mentioned this during the TempDB session).

The two scripts he mentioned for monitoring the performance that he release are:

  • gather_file_stats_2008
  • report_file_stats_2008
The Stall that is reported can be > query time because stall is reported the sum of all threads and query time is overall.

When you have High Stall writes here are some things to try:
  • Add Writeback Cache to your RAID controller
  • Rebalance existing caches to 100% write and 0% read. SQL Server does not benefit much from read cache.
  • Add spindles
  • Separate Data and Log
  • Tune your SQL!
Wait Stats
There over 485 of these buggers! Internal and External ones are exposed in SQL 2008.

There is a new type that start with PreEmptive and are used to indicate you are waiting for something outside of the SQL Server's control (such as an OS or web service call).

Wait Time is the overall time being waited for while signal time is how long the task is waiting to get some CPU time. The actual Resource Wait Time = Wait Time - Signal Wait.

It is generally safe of ignore System and OLEDB. The only time you could impact these is by doing linked server calls. Some example waits:
  • CXPACKET: Check MAXDOP, lowering it may help.
  • LCK_M_%: You've got blocks!
  • ASYNC_NETWORKIO: Almost never the network, this is typically a slow client (I'm looking at you Data API!)
  • PAGE_IO_LATCH, IO_COMPLETION, WRITELOG: There is a storage issue somewhere.
  • PAGELATCH: Internal Resource Contention, check TempDB.
SQL Automation and some Powershell (Buck Woody)

This must be the funniest man at PASS. The session was great and he covered a lot of good stuff. I am going to watch all of his sessions when my DVDs arrive. I learned a lot and didn't realize it because he made it a lot of fun.

Powershell is now part of the "Common Criteria" at MS. That means that every product that is released must be able to managed using Powershell commands. What this really means: MS is not dropping Powershell any time soon and is probably worth learning.

There is a built in "MiniShell" in SQL 2008, you can open it right from object explorer.

Powershell resources to look into:

Advanced Physical Database Design (Boris Baryishnikov)
This was brutal, the guy was nice but his session completely mislabeled. It was about indexing, the database tuning advisor and sparse columns. There is a lot of Physical DB design choices to make when converting from a Logical Model and he didn't cover any of them. This was probably the most annoying session at PASS because the title was so completely inappropriate for the content.

Thursday, November 5, 2009

PASS Summit 2009 Day 2

Second day was also pretty good. Nothing was going to beat yesterday but this was still pretty great.

Day 2:
Lab Time
I skipped out on the keynote this morning because it was more marketing fluff and PASS specific awards. I followed it on twitter while working through some training at the MS Lab. The lab is a bunch of machines setup with dual monitors, virtual machines and instructions. It's great because it allows you to go at your own pace.

I did labs on SQLCMD and Partitioning. SQLCMD is interesting because it allows you to do things like swap connections mid-script. Handy, but not exactly that I was looking for. The partitioning lab was awkward because the solution scripts were not setup right. I had to set the path for all the scripts that were read in pro programmatically (surprisingly enough, using SQLCMD).

Developing with SQL Server Spatial, Deep Dive into Spatial Indexing (Michael Rys)
This was fascinating. The fellow who gave this presentation is in charge of the Spatial feature at MS. He covered the indexing method they used, and why they chose it. It turns out to be based on the normal B+Tree model with a grid overlaying the information. Each square on the grid has three states:

  • Definitely Not intersecting
  • Definitely Intersecting
  • Maybe Intersecting, Maybe Not
The Maybe was the interesting one. This be because Floating Point math is used to calculate the information and the the a margin of error for that listing was above a specific threshold.

The detail of the grid is recorded with internal tables (that can be queried) and clustered by the grid sequence),

He also explained why the Geography data type has fewer features that the planar Geometry datatype. The definitions are clearly defined for planar data but they are are ambiguous because of earth curvature for Geography. Until there is a standard definition (or possibly a huge user outcry?) they will not implement those functions for the Geography data type.

Finally, after the presentation I walked up to the man and thanked him for the incredible work his team has done. I explained about the performance improvement we had over Oracle Spatial and he was surprised. The goal of the SQL Spatial team was apparently to not be more than 3x slower than Oracle! Amazing, it is much faster in my experience.

SQLCAT: Customer Experience with Data Compression (Sanjay Mishra)
There are two types of data compression, Row and Page.

Row compression may take up to 10% CPU overhead and it uses the minimum bytes to store the info being collected regardless of the data type. In SQL 2008 R2 it will also support Unicode using the Standard Compression Scheme for Unicode. This is more efficient then UTF-8 because it adapt to your primary character-set being used in the field for that record. If you have a mix of English and Japanese it will make the dominant character set require only one byte per char if possible.

Page compression varies a lot in terms of CPU overhead. It does Dictionary and Prefix compression. These effectively mean that the DB will record a string only once and if that string is used again on the same page it will use a pointer instead of repeating itself. This means that each 8k page is compressed independently.

This doesn't improve IO time, it will in fact take longer to do inserts and updates. Select speed may improve and storage space needed should decrease a fair percentage. The case studies they showed were 30% - 40% savings, which adds up when talking about TB of SAN storage.

Dude, Where is my Memory? Understanding SQL Server Memory Management and Usage (Maciej Pilecki)
This was an awesome presentation. There wasn't much new there but he covered a lot of what I had to learn through blood, sweat and tears. He did a great job covering the 32-bit vs 64-bit memory model and some of the pitfalls associated with AWE. He also spent a good amount of time on the Lock Pages in Memory option. Sure, a lot of people posted about it when it came out, but it bears repeating. If your DB is being swapped out to disk for no discernible reason, set this option on! I've had SAN and Network drivers cause this problem for me and this was a lifesaver.

Wednesday, November 4, 2009

PASS Summit 2009 Day 1

This was my first PASS Summit and it has been amazing.

I went to the Pre-Con Session on Index Performance and really enjoyed it. I mean who wouldn't, an entire day analyzing index and statistical internals is just a really good time, right? At least thats what all of us at the session thought! I'll post more about this later.

Day 1:

This was not impressive. Sure all the video and shiny was nice but it was all marketing speak. The two interesting bits were:
  • The 192 CPU box they demonstrated SQL Server 2008 R2 running on. That is AMAZING hardware (from IBM of course) and SQL Server scaling on it was impressive.
  • Hyper-V Migrations. Sure, it was done better and earlier by VMWare years ago but it's still amazing. The demo had them moving a VM from one machine to another while queries were running. No connection drops, at all. This tech is awesome, I can't wait until they are able to get rid of the IO overhead
Introducing Master Data Services
  • Tooling on this is very nice and complete. Much better than the Purisma stuff I've seen. It seems to piggy-back on Sharepoint.
  • I snagged John McAllister (the program manager at MS) and asked him about our specific issues.
  • They haven't tested with a dataset our size before but would love to try it.
  • Many to Many relationships are a hack in there too.

Ask the Experts
  • I was asking about Entity Attribute Value modeling techniques. Generally it was recommended to go with the XML data type. When I raised scalability concerns with the volume of data we use they agreed it may not work.
  • An interesting idea was to have a CLR trigger analyze the data instead of a set of web services.
  • We also have an issue where we are doing a join outside of DB. There are about 12 queries that hit at once and they don't release query memory until the last row is pulled off the recordset. I need that memory released when the query is done, not when the recordset is delivered. I spoke with the JDBC and Database Engine folks. It was great.
  • It was decided that it was unlikely the JDBC driver (though he recommended switching to Server Side cursors, which we did but they add 25% to the processing time).
  • The queries all have an order by which means it should be storing the work in tempdb and not keeping the cache. I emailed the MS guy the query plans for further review.
Managing TempDB
  • This was the least impressive. The talk was given by two very smart folks from MS but the presentation wasn't very polished.
  • The most interesting thing was demonstrating the how snapshot isolation uses the tempdb. I wasn't familiar with the DMVs and performance counters they used for this.
  • The major advice was the old chestnut of "match the number of cores to the number of tempdb files".
  • I also liked how they pointed out the transaction log in tempdb follows different rules. Specifically there is no "REDO" information recorded because tempdb is reset on each startup anyway.
  • There is also something special that happens with object creation in tempdb. Because the same objects are created over and over again (#t for instance) they cache the object definitions to avoid the insert time into the system tables for something so transient. Clever.
Designing and Building Private SQL Clouds (not SQL Azure)
  • Awesome. Ryan Jones and Sarah Barela made a great team.
  • They explained how MaximumASP is built it's own cloud infrastructure with a combination of Hyper-V, SQL 2008 Enterprise and Powershell.
  • They leverage Hyper-V's migration features to move VMs around seamlessly.
  • The SQL 2008 Enterprise license allows you to use as many SQL Enterprise instances as you want on a virtualized machine.
  • The Query Governor and Performance Data Collector help a lot with managing VMs.
Wow. That was one amazing day.