Capacity Planning on SQL Server Active-Active Failover Cluster Instances [Step by step

In this series of blogposts, I will describe how to carry out a successful capacity planning project on existing SQL Server failover cluster instances environment with the SQL Governor software.

To spice it up, this step-by-step blog series will address a scenario where the server hardware is planned to be renewed. The method described is applicable to any 2-node Active-Active failover cluster instance environment having one to many SQL Server instances per node.

Let's begin!

Setting the scene

Our environment consists of two-node Active-Active cluster with one SQL Server instance running on each node. Servers are named as SOURCEDB1 and SOURCEDB2. Instances are named as SOURCEDB1\SQLINST1 and SOURCEDB2\SQLINST2.

Hardware specification:

Our old servers are identical running on Dell R610 hardware with 12 core 2.53 GHz Intel Xeon E5649 processors having configured 32 GB RAM each.
The plan is to upgrade the servers into newer Dell R640 hardware with Intel Xeon gold series processors.

SQL Server FCI part 1 Setting the scene

The current FCI environment – the object of our capacity planning project.

Our main goals for the capacity planning project

Replace old hardware with newer, better performing hardware for the next 3 years.
Gain the ability to forecast needed capacity for processing, memory, and storage for both FCI nodes, taking in count possible failover situations.
Reduce the number of CPU cores and save in SQL Server licensing costs without sacrificing performance or availability of the system.

Next, let's take a look on what else we need to know before starting the capacity planning project.

Diagnosing existing FCI nodes and instances

To be able to succeed in capacity planning of the FCI environment, we should have diagnosed the existing environment to be sure that the tempdb settings, user database filegroups, trace flags, instance level configurations, database options, indexing and T-SQL queries as well as database maintenance jobs are set in terms of best practices.

It makes no sense to substitute hardware for any badly configured SQL Server system, unless it is absolutely compulsory for one reason or another. The topic is addressed in my comprehensive blog post about SQL Server diagnostics and health checks.

Gathering monitoring data

You will also need a lot of monitoring data on your server nodes, instances and databases. It is a good idea to monitor server and instance level performance counters on at least 2-minute intervals and database information on at least 60-minute intervals. For actual capacity planning purposes, the monitoring data should be preserved most preferably on 1-hour granularity.

Monitoring data should cover at least...

Server level

Peaking and average CPU usage
RAM usage
Processor queue length

Instance level

Peaking and average CPU usage
RAM usage
Buffer cache hit ratio
Page life expectancy
Tempdb allocated and used size
Tempdb read/write iops
Tempdb read/write mb/s
Templog allocated and used size
Templog write iops
Templog write mb/s

Database level

Disk read/write iops
Disk read/write mb/s
Database files allocated and used size
Log files allocated and used size

SQL Governor automates and streamlines the capacity planning of SQL Server failover instances

In addition to the monitoring data, we must understand the ratio in between the computational power of the old and new servers. For example, if the processing power is 500 units on source server and 800 units on target server, we can calculate the relative performance effect in between the servers by extrapolating the CPU usage percent performance counter time series data respectively. SQL Governor uses SPEC int continuous throughput results for the benchmarks.

Also, we need to understand the trend for each performance counter time series. This is because once again, we must be able to extrapolate the time series data based on the trend given. The extrapolation is automated in the SQL Governor software.

It is also crucial to correctly calculate the summed-up instance level workload on both FCI nodes. In the failover situation, a distinct node must handle the workload of both instances, in addition to the OS and service level workload.

There are some other important capacity planning parameters we need to know as well. I will circle back on those in the upcoming parts of the blog series.

To be able to get the seasonal characteristics of the time series data, it is recommended to gather at least 3 months of consecutive monitoring data, covering the busiest season. For some special, season-based businesses, it may be necessary to gather the data from whole calendar year.

SQL Governor process for SQL Server capacity planning

With SQL Governor software, the overall process for the capacity planning is as follows:

Diagnosing existing FCI nodes and instances
Conducting a heatmap analysis of the environment
Conducting best fit analysis of the environment
Conducting a recurring workload analysis of the environment
Conducting FCI failover analysis and selecting the right-sized hardware

In my next blogpost of this series, I am going to go through the step 2: Heatmap analysis of the SQL Server failover cluster instance environment. Read the blog post here!

Jani Savolainen
Founder & CTO
SQL Governor

Capacity Planning on SQL Server Active-Active Failover Cluster Instances [Step by step – Part 1]