In the second part of the blog series, we proceed a Heatmap Analysis of the SQL Server FCI Environment, which is the second step in a successful SQL Server capacity planning project.
Check out the previous blogpost of this series where we set the scene and described how to prepare for the capacity planning project on an Active-Active Failover Cluster Instance environment.
Why you should love heatmaps
Heatmap is a great tool for visualizing the changes in capacity utilization over a certain period.
For DBAs and sysadmins, heatmaps offer a very concrete, easy-to-illustrate way to show the over- and under-utilization of the SQL Servers and instances for your superiors:
- Which servers and instances are too busy?
- Which ones are doing nothing?
For preparing the actual capacity planning, SQL Governor software provides a set of comprehensive heatmaps for all this. You can filter out the service time from all calendar hours to get exact understanding of the average and peaking CPU utilization on different time intervals. It is also possible to drill down from server level into individual instances.
Reviewing monthly CPU utilization
The monthly heatmap below visualizes the average and peaking monthly CPU utilization for each SQL Server and instance.
As can be seen from the heatmap, SOURCEDB2 seems to be too busy for a very business critical server. Average CPU should not exceed 40% utilization and peaking CPU should not exceed 60% level on any given month for a standalone server. In the case of an Active – Active cluster, the critical limits are closer to the 20% – 30% level.
The situation needs closer investigation!
SOURCEDB1 is not that busy in terms of average CPU usage, which is a good thing. Yet still, peaking CPU is relatively high.
In this case, we are not investigating instance level workloads separately, even though that would be possible. This is because the only extra workload on each node comes from OS reserve. Therefore, instead of using Instance Level Consolidation feature of the SQL Governor, we are good to go with Server Level Consolidation feature.
Note! In SQL Governor, the peaking CPU is measured in 2-minute intervals. The peaking values may look busier in the heatmap on an hourly basis, because the busiest 2-minute interval is chosen for each hour. That’s why we should conduct a more detailed analysis for 2-minute intervals to see if the peaking CPU’s are as busy as it seems over consecutive 4-minute intervals, which is more directional for long term CPU utilization.
Reviewing daily and hourly CPU utilization
This heatmap visualizes average and peaking daily CPU utilization for each SQL Server and instance. You are also able to drill down into individual hour series of a desired date.
As can be seen from the daily heatmaps, the workloads have been exceptionally high on SOURCEDB2 from 13th until 22nd of January.
Let's take a closer look.
When inspecting the time series of the busiest day, which seems to be the 15th of January, we can see that the server is especially busy during the late evening to night hours.
Reviewing heatmap distribution
This report shows the distribution of the actual server level CPU usage on discrete scale. We can see that SOURCEDB1 has all the log rows from previous weeks on very low level of CPU utilization, measured by 2-minute intervals: 96% of all the log rows fall into a category of 0-10% CPU utilization only.
On the other hand, SOURCEDB2 has 37% of all the log rows in a category of 41-50% CPU utilization and even 7+2 = 9% in a category of 71-90% CPU utilization. This tends to be a too high CPU workload for such a critical server, especially if the processor queues are high at the same time.
In the next part of this blog series, we will dig deeper into the CPU side of these servers, conduct a best fit analysis of the environment, and right-size the servers.
SQL Governor process for SQL Server capacity planning
With SQL Governor software, the overall process for the capacity planning is as follows:
- Diagnosing existing FCI nodes and instances
- Conducting a heatmap analysis of the environment
- Conducting best fit analysis of the environment
- Conducting a recurring workload analysis of the environment
- Conducting FCI failover analysis and selecting the right-sized hardware
In my next blogpost, I will go through the step 3: Conducting best fit analysis of the environment.
Founder & CTO
Wish to learn more on the topic? We have a webinar coming up on Thursday 27 February with our partner Soaring Eagle Consulting. In the webinar, we go through the SQL Server FCI Active-Active capacity planning process, step by step.