Monitoring Mission Critical SQL Servers

VIDEO BLOG SERIES: Monitoring Mission Critical SQL Servers - How to utilize alerting to predict system bottlenecks

In the previous part of my video blogs I talked about the proactive monitoring methods, and how they can be utilized for preventing performance problems to occur. In this final part I will tell how I see the future of monitoring, and summarize the series.

Root cause detection, root cause fixing and root cause prevention are the future of performance monitoring

Now something about how I see the future of monitoring. I believe the next level of monitoring would be so called root cause detection. It's an AI driven concept, which means that the performance monitoring software could track all the wait stats, all the instance-level configurations, the database options, T-SQL queries, indexing, all the performance counters, etc. and based on an analysis of all these
factors it can identify in the monitoring some bottleneck. And by investigating all this information it can state what the reasons can be and the probabilities for the reasons.

This type of method could help in your decision making and make you achieve faster time-to-solution. But it also involves a risk of making wrong decisions. It is also a complex mechanism to implement.

The next level after root cause detection would be root cause fixing. That means that in addition to detecting the root cause, the AI driven software could also fix the problem. For example, would there be a need to add up an index, or drop an index, or to fix a MAXDOP setting on a certain instance-level configuration, in order to fix some exact problem there is in the system.

This method would make you spend less time in firefighting. It could help to achieve faster time-to-solution, and higher SLA's and up-time. But this mechanism would also be very complex to implement and there is still the risk of making wrong decisions.

The last level I would see in the future of monitoring, would be root cause prevention. In this method, the algorithm could preliminarily detect the root cause and fix it. So it would combine these mechanisms, and a lot more that I have went through, in order to do that magic. It could show that within three hours there's going to be a problem and how it's going to fix it, and then it fixes it. Then it would inform a DBA that it has fixed a problem that would have occurred in the system if nothing wouldn't been done.

The pros and cons for this kind of mechanism are similar to the previous one—you can spend less time firefighting, achieve faster time-to-solution, improve SLA's and up-time, but it will also be extremely complex to implement and still the risk of making wrong decision exists.

I also wanted to say that I do also believe that there will always be a need for DBAs who understand all the nuances of the system and all the business logic changes in the system, and such. These are the type of things that would be very hard to teach for the AI.

Don't rely on just reactive monitoring—the proactive methods are already here

As we have now gone through the different aspects of SQL Server performance monitoring, to summarize, I would like to say that relying on just reactive monitoring is not enough anymore. Proactive monitoring is something that is already here today—for example, our
SQL Governor software uses these mechanisms, such as predictive monitoring, pattern-oriented monitoring and anomaly detection.

When talking about performance problems, typically 70% of the issues are caused by bad coding, and 30% of by poor capacity planning and infrastructure problems. Thus consider, how you can have these both areas covered when managing your data platform. Also, you should define your technical service levels to understand your baselines better, and to be able to do capacity planning with exactly the right amount of resources, for not to over or under invest in your platform.

This concludes my video blog series on SQL Server performance monitoring. Hope you enjoyed it and learned something new. Please feel free to contact me, in case you want to ask some more or discuss what you think about the monitoring methods.

Jani K. Savolainen
Founder & CTO
DB Pro / SQL Governor

Monitoring Mission Critical SQL Servers - Part 11-12

VIDEO BLOG SERIES: Monitoring Mission Critical SQL Servers - How to utilize alerting to predict system bottlenecks

Root cause detection, root cause fixing and root cause prevention are the future of performance monitoring

Get the most out of your data platform - now and in the future - without overspending. Contact us today!

Contact form