So the conference began. Registration was a snap, I got a nifty backpack that is big enough to hold my workstation (an Elitebook 8560w) and was remarkably comfortable. I don't think it will replace my Tom Bihn bag for travel but it may be my new commute backpack.
The session I attended today was SQL Server 2012 in a Highly Available World
and, while it was not what I expected I found a lot to learn. I was expecting a focus on Availability Groups with maybe a section on Failover Cluster Instances (FCI). Instead this was a much broader session. It started with Log Shipping, Database Mirroring (deprecated in SQL 2012), FCI and then finally Availability Groups. While I didn't want the first bits (Log Shipping has been around forever and Mirroring is going away) it was a good refresher and I learned a lot.
The best part of today's session was working through a problem that the person next to me was having during the breaks. There were two real questions
- Can you mirror upstream from SQL 2008R2 to SQL 2012?
Can you combine Failover Clusters with Availability Groups?
- It turns out you can, if the patches / services packs are correct. In fact this will be demonstrated later this week in one of the sessions!
- Yes, but it prevents automatic failover between datacenters, this may be fixed in a service pack.
I now have an intense need to get VMWare going on my laptop and start messing with cluster configurations.
My notes from this session are below.
PASS Pre-Con 1: SQL Server 2012 in a Highly Available World
This was an overview of the High Availability options typically used in SQL 2012. Topics ranged from the venerable Log Shipping to the newer and shinier Availability Groups.
What happens when the database is set to "Bulk Logged" mode and log shipping is enabled? It turns out that the data never makes it over to the target system unless it is updated afterward. This is a good example of why data should only be bulk loaded into staging tables, cleaned and moved to final tables using normal, logged commands.
Test log shipping Disaster Recovery (with an outage window)
- Server 1: "BACKUP LOG ... NORECOVERY"
- Server 2: "RESTORE LOG ... WITH RECOVERY"
- Do stuff (nothing bad) and flip it back
- Server 2: "BACKUP LOG ... NORECOVERY"
- Server 1: "RESTORE LOG ... WITH RECOVERY"
This maintains allows you to maintain the log chain and test DR as long as only legitimate transactions happen as part of the test.
The reason that this still matters is that is was JUST deprecated. That means it will still be there for at least the next two releases (SQL Server 2014 and 2016 presumably). The last version of SQL Server with mirroring will be supported for at least 10 years after release, that means it will be 2026 before it will be gone.
Unsupported DB Objects
The Log Stream endpoint hangs out on port 5022. This endpoint will ONLY talk to other SQL Servers and is always encrypted, typically by a server signed certificate.
If everything is working well, Synchronous replication is expected to only add 5ms to the DML commands at distances < 100 km. If the distance is > 100km then Asynchronous Replication is recommended. The Mirroring overhead time is viewable in the Mirroring Monitor (which presumably fronts a DMV).
The greatest visible lag in mirroring is when the application realizes that it must switch to the secondary node. The application needs to have a connection timeout happen (5 - 10 secs) before flipping to the next server (which will of course work).
Mirroring is not impacted by single user mode (other than nobody doing transactions) because it is using a log reader and is not "in" the database.
Always On - Clustering (Local)
The dreaded Hardware Compartibility List (HCL) is gone! The reason for this is that the storage options for the cluster now include network shares (not just iSCSI). Now who in their right mind thinks that network shares over SMB are a good idea for the DB? Well the enhancements to SMB (starting in 2.2 with Win 2008R2 but much improved in 3.0 with Win2012) are pretty amazing. They have removed the overly chatty aspects of the protocol and made it a viable alternative to SAN storage. There is a lot of value going Win2012 instead of Win2008R2 for the SMB 3.0 enhancements.
Sadly, the validation wizard still needs to execute to make sure that the node can be part of a cluster. The good news is that it is a lot less invasive (doesn't do things like take SAN Luns offline). It is still slow, while a 2 node cluster may take 10 minutes, larger ones can take a weekend or longer.
When setting up the cluster, if you are not a domain admin you will need to either get access to add computers to AD or get the new cluster name added to the directory before installing.
The heartbeat for the cluster is using IPv6, if it is not a dedicated connection between the servers the network must be able to accommodate it (it's 2012, probably time to update to IPv6 anyway).
Windows Core is completely supported for the database engine. An interesting thing about core is that (in case of emergency) you can now activate the full GUI when required and then flip it off again when done. The core instance actually keeps all the DLLs and extra files on disk, but in a packed form that is unpacked (into memory?) when needed.
Best Practice tip: Try to have your resource group name match the network group name for your cluster. It will improve manageability a lot.
Quorum Modes configuration
Node Majority Set should be chosen for large clusters (> 2 nodes) with an odd number of nodes
Node and File Share Quorum should be chosen for large clusters that may not have an odd number of nodes
Quorum Drive can be used for 2 node clusters but remember, this is a single point of failure.
Win2012 improvement: Dead Nodes are removed from voting after they are kicked out of the cluster so the quorum requirement is recalculated.
Database Cluster Configuration
TempDB can finally use direct storage! This allows the use of local SSD or even Fusion-IO cards (gasp!). This can be a lifesaver in systems with heavy TempDB usage.
The number of available SQL Server nodes per cluster is getting absurd in Enterprise Edition (but is still stuck at 2 for Std and the new BI Editions). The cap is somewhere around 64, but is really limited by the number of drive letters (26 - A,B,C,D = 22). Technically, this can probably be worked around with mount points but requires further research.
One thing that seemed to cause confusion was the idea that even named instances can be treated like a default instance if you left it hanging around on port 1433. I've always thought of instance naming as just a human readable shortcut for remembering the port number (I know for other protocols the name is more important but most systems are TCP/IP focused). So, if the instance is locked to port 1433, then when you try to connect without an instance name that is the door it knocks at.
When reviewing the cluster resource groups make sure that the mount points are explicitly listed and set as proper dependencies, this does not happen automatically at this time. This can be complex for mount points because the label can be a bit obscure, use the Disk ID to match the volume in Disk Management to the one in the Cluster Manager.
Slipstream installs are now deprecated. I didn't see that one coming. The reason for it is that every patch that comes out need to get an updated slipstream image generated. Now you can store the patches in a centralized location and point the installer at it. It will pickup and apply the patches automatically. This is a big improvement if you haven't automated your image generation yet, otherwise it's a minor change.
Only the Database Engine, Analysis Services and SQL Agent are cluster aware. The other SQL Server goodies are STILL not cluster aware. This includes the MDM solution (MDS + DQS).
Always On - Geo Distributed Clusters
There have always been a LOT of obstacles when doing Geo-Clustering, primarily around moving the data and managing the network configuration. As far as I know, MS is not touching the data side, they are leaving that to the SAN experts for this option. However, MS now has options for Multi-Subnet Failover where the IP Address can be set dynamically based on the available subnets. This feature may
stop the network admins from wanting to kill you.
Always On - Availability Groups
Different from Mirroring
This is really a fork of the mirroring technology used previously. It is still an encrypted transaction log stream replayed from the primary to the secondaries with the same overhead.
There can only be a total of 4 replicas of any single availability group.
- Synchronous Replicas are limited to 2
- Asynchronous Replicas are limited to 4
This means you can have 4 Async Replics or 1 Sync + 3 Async or 2 Sync + 2 Async but NOT 3 Sync + 1 Async.
- Read-Only replica (independent statistics kept in tempdb)
- Routing based on application intent
- Increased number of worker threads
- Increased cap from 20 to 200 Replicas / instance
- Replication Publishers and Subscribers are supported
- Filestream / Filetable
- Extended events are supported
- Cannot be used across domains
- Does not support the replication distributor database
- No SQL Server Profiler Events are available for Availability Groups
Fancy stuff. You can set your read-intent route to point at a load balancer that will bounce you between several read-only instances if desired.
FCI + AG
The combination of Failover Cluster Instances can be used in conjunction with Availability Groups for a very comprehensive solution (licenses may get pricey though). The sequence for doing this is:
- Install Windows Cluster Services
- Install SQL Server Cluster
- Enable Always On Availability Groups
- Create / Restore Databases
- Configure Availiability Groups
SQLCAT has a whitepaper
on implementing this combined cluster.
- Full and Differential backups shold be taken from the primary
- Transaction Log backups can be taken from the secondary. The info is pushed up to the primary so it knows that the logs have been archived.
- Only maintenance plans (and Ola's script) support backup rules so far.
- An Availability Group Listener is created (more Active Directory fun) and all systems should connect to that name.
- The Availability Group Listener is not supported by the SQL Server Browser and is recommended to live on port 1433 to avoid confusion.
- If there is a read intent connection and no secondaries are available to service the request it will execute on the primary.