Tag Archives: Cluster

Failover Clustering: Correct Quorum but single node failure shuts down cluster

Another post on Windows Failover Clustering, but this one isn’t so pretty by comparison to my last post.

Recently on a 3 node Failover Cluster we were presented with a little warning that a node failure would cause the cluster service to stop and that the cluster configuration should be checked.
Well in a 3 node cluster the correct Quorum setting for an odd number of nodes is Node Majority and after double checking this was set correctly, that comms between all the nodes was in place and working and also running the Validation tool we were satisfied the configuration was indeed correct.
Obviously the last thing you want to risk is assuming the error is erroneous and then find that when a node fails the cluster also stops.

Satisfied the configuration was correct we presented the issue to Microsoft Support. The suggestion and indeed the resolution was as follows:

  1. Add a Witness disk to the cluster
  2. Change the Cluster Quorum setting to Node and Disk Majority
  3. Change the Cluster Quorum setting back to Node Majority

We didn’t get a reason for why this happens other than sometimes it does and switching to another Quorum setting and back again generally resolves the issue.

Hopefully this will save some people some head scratching and the need to contact MS support.

For full details on the correct Quorum configuration for Failover Clusters refer to the Technet Article: http://technet.microsoft.com/en-gb/library/cc770620%28v=ws.10%29.aspx

James

Windows Server 2012: Cluster Aware Updating

In my continued efforts over the past few weeks putting together a Windows Server 2012 Hyper-V Cluster, I recently discovered a nifty new feature to Windows Server 2012’s Failover Clustering – Cluster Aware Updating.

This feature is going to save a lot of SysAdmin time when it comes to patching your Failover Cluster nodes, the only real interaction required is simply to setup the schedule. Cluster Aware Updating will fully automate patching your cluster nodes one-by-one without impact to your cluster applications or roles.

Initial setup of CAU requires that you select a “Co-ordinator”, and this basically does what it says on the tin. The Co-ordinator manages and monitors the patching tasks across the nodes in the cluster. This role can be enabled within the cluster or outside.

The CAU Co-ordinator will perform the following steps -:

  • Download Updates to each node
  • Selects the node with the fewest applications/roles first (although you can specify a specific order during setup)
  • Initiates a Node Drain, i.e. moves the applications/roles off the node to other nodes in the cluster
  • Sets the node into Maintenance Mode
  • Installs the downloaded updates
  • Restarts the node if required
  • Verifies the installed updates
  • Brings the node out of Maintenance Mode
  • Moves the applications/roles that were previously moved off the node back again
  • Repeat the above steps for the next node in the cluster

As you can see performing those steps manually is a very time consuming task, especially for large clusters with many applications/roles. The most time consuming and tedious part being the application/role migrations and ensuring you move the same roles back again afterwards.

CAU can install updates from a number of sources including:

  • Windows/Microsoft Update
  • Windows Server Update Services (WSUS)
  • Hotfixes or Cumulative Updates not released via Windows/MS Update (setup a file share)
  • 3rd Party Driver and Firmware updates (setup file share)

So not only does CAU save you time but it ensures that your cluster nodes are all at the same update levels too which of course is desirable at all times.

One thing I did notice was the SCOM agent on the cluster nodes got stuck in Maintenance Mode. I had to fix this by putting the nodes into Maintenance Mode via the SCOM console for 10 minutes, after which the nodes were successfully monitored again.

This is certainly one of my favourite additions to the Server 2012 feature set so far. If you have a Server 2012 cluster then enable this feature!

 

James