In this online guide, I will help my readers overcome the challenges in Microsoft SQL Server consolidation planning. You can access the main page of "Mastering SQL Server Consolidation Planning" here.
In my previous blogpost I described the phases of a consolidation planning project. This part explains how to monitor the existing environment.
SQL Server performance counters to monitor
SQL Server monitoring is crucial for a successful consolidation project. Without the right performance counters and metrics, it is basically impossible to forecast the future capacity needs of the system. There are important metrics at three levels of the system: server level, instance level and database level.
Good server level performance counters to monitor are:
- Peaking and average CPU usage
- RAM usage
Good instance level performance counters to monitor are:
- Peaking and average CPU usage
- RAM usage
- tempdb allocated and used size
- tempdb read/write iops
- tempdb read/write mb/s
- templog allocated and used size
- templog write iops
- templog write mb/s
If database level consolidation occurs, good database level performance counters to monitor are:
- disk read/write iops
- disk read/write mb/s
- database files allocated and reserved size
- log files allocated and reserved size
- CPU time ms
Note! In addition to these performance counters there are many other useful counters as well. Some of those additional counters give more information about the health of the system rather than performance, for example, counters like read latency, buffer pool size, buffer cache hit ratio, processor queue length, deadlock count etc. It is a good idea to cover these counters in the “SQL Server diagnostics and health check” phase of the project.
Selecting the right time span and grain for SQL Server monitoring
Typically, there should be at least 3 months of concurrent monitoring during the peak months to get any idea of overall resource consumption. The problem with shorter periods is the inability to predict long-term trends and therefore to fit the appropriate time series together. In more challenging environments 6, 12 or even 24 months may be required in order to forecast performance metrics with a high level of accuracy. Ideally, there is performance data from the whole lifecycle of the environment to generate an hour-based time series. That is one good reason why it is so important to have a SQL Server monitoring software such as SQL Governor® running all the time in the production environment. Performance data is the most valuable asset for an SQL Server consolidation project.
Missing time series in SQL Server monitoring
Sometimes it happens that there is no performance data for a certain time span. Many reasons can cause this, including a server, service, hardware or network failure or a planned service break. It is noteworthy that a missing time series may affect the actual capacity calculations. Missing time series data should be investigated and analyzed to decide whether the information is critical for capacity planning. If the answer is yes, then it is good idea to continue planning work with extra monitoring. Generating actual time series is preferable instead of summing up existing data to maintain accuracy and comparability over time between the DBMS instances in capacity planning.
Monitoring SQL Server service times
Different servers, instances, databases, customers and weekdays may have different service times. You need a mechanism to keep track of service time windows in your existing system in order to get a better understanding of your production environment capacity needs over time. This is because you may want to filter out those members from your result set which lie outside service times. Once again, this makes the actual capacity calculation more precise and practical. Our SQL Governor® software allows you to set service times for each individual performance object (such as a server) to be monitored.
Adding SLAs in SQL Server capacity planning
A Service Level Agreement (SLA) tells us how well a certain performance counter is behaving at each moment. For each SLA you need to define the target percentage that the SLA must be equal to or exceed. One typical example of an SLA is server uptime: If the SLA criteria (in this case “the server is up and running”) is met for more than 99.9999% of all members in the time series, the target SLA is met; otherwise not. This makes it possible to fine-tune the performance metrics for capacity planning. SQL Governor® software includes a comprehensive set of monitoring SLAs for this purpose.
The Part 4 of this guide will be about the requirements specification phase called "SQL Server diagnostics and health check”.
Jani K. Savolainen
Founder & CTO
DB Pro Oy