SQL Server performance insights – Logical hierarchies in composite indexes

This blogpost starts my series of blogposts in the field of Microsoft SQL Server performance optimization. The goal of these blogposts is to be practical but in-depth, so I hope you find them interesting even though you may have worked already for many years with databases and T-SQL. The initiation for these blogposts came from my willingness to share some of my thoughts and ideas, tips, and tricks I have learned during the last 25 years in programming, and, in this context, in SQL Server databases. So, what is this first blogpost in the series about? It is about an often-misunderstood concept, the composite indexes i.e. thick indexes.

What is a composite index?

In Microsoft SQL Server, composite index is a B-tree format index, which is organized based on several concatenated indexed columns as a part of the index key. In SQL Server, by organizing the columns into a single composite index, we get better query performance against certain T-SQL queries. But the devil is in details: If you don’t understand how the B-tree structured composite indexes are built and used by the SQL Server, you are not always able to create efficient composite indexes.

What is a B-tree?

Let’s first look at the B-trees more closely. A B-tree is a balanced, sorted data structure allowing efficient seek-, sequential scan-, insertion-, deletion- and update operations on data. In B-tree, each parent node has a certain number of child nodes. This is called branching factor and affects the depth and efficiency of the B-tree.

Any B-tree node can contain multiple keys and has pointers to child nodes. These keys are sorted within each node. This structure ensures that all leaf nodes are at the same depth.

How is a composite index built?

In composite index, each key of B-tree nodes is composed of two or more values from different database table columns, depending how many columns exist in your composite index. For example, if you had columns X, Y and Z in your composite index, you would have a combination of “(X, Y, Z)” in the key. The sorting of keys follows a lexicographical order: The first column in the composite index is the primary sorting element, the second one is secondary and so on. This affects how efficiently queries can utilize the index, particularly when the query conditions involve subsets of the index columns.

To keep it simple, you can imagine the logical index structuring process as follows:

Composite index constraints in SQL Server

In Microsoft SQL Server, you can have up to 32 columns in a single composite index key and all the columns in a composite index key must be in the same table or view. The maximum allowable size of the combined index values is 900 bytes for a clustered index and 1,700 for a nonclustered index. The older limits were 16 columns and 900 bytes for versions before SQL Database and SQL Server 2016.

What affects composite index performance in SQL Server?

A composite index is queried in the order of the columns organized in it. So, if the column ordering is wrong, the index can utilize only the first or some columns of the whole index, depending on the sort order.

In addition to cover as many logical column combinations as possible, crucial part of the efficient usage of the composite indexes is column selectivity: Columns with higher selectivity (i.e., more unique values) should generally be placed earlier in the index, because high selectivity improves the efficiency of the index. But there is an exception, and this is called a logical hierarchy.

How to build an efficient composite index from a logical hierarchy?

Let’s think about a real-life situation: We have created a Fact table (F_DATABASE_STATISTICS) and a referred dimension table (D_DATABASE) in a star schema to gather the telemetry data at given date and hour from all the databases in all the instances in all the servers in all the domains on our hypothetical environment. So, the fact table holds the performance data for each database, such as percentual CPU usage over each SQL Server instance:

Now, for terms of simplicity, let’s focus to the D_DATABASE table, where we have a clustered primary key column (PK_DATABASE), and following basic indexes:

First, let’s review the column count and cardinality of this logical hierarchy:

We’ll get the following result:

As can be seen, there are some redundancies on this dimension row data members. Normally, if we created a composite index, we would follow the rule of “starting from the most selective column” and would select the DATABASE_NAME to be the first column in our thick index key, then INSTANCE_NAME, then SERVER_NAME and then DOMAIN_NAME. Is this a good idea? Well, it isn’t, for several reasons, and here’s why:

We would need the following query to get the average CPU usage percent for each database in the hierarchy:

Note! As you can see, in this example I drop the memory buffers in the first line of the code, which basically purges the data cache in memory from given SQL Server instance to get realistic picture of the query performance. You should not run this query in production environment, because it hits the query performance big time until the buffers are recreated. For more information on this command, you can find from here:

DBCC DROPCLEANBUFFERS (Transact-SQL) - SQL Server | Microsoft Learn

When looking at the search conditions of the query, we can see a perfect spot to implement a composite index.

Now, let’s look at the query performance (best out of 5 attempts) without any composite index:

How about the query plan? Looks like we are ending up to the clustered index scan in the D_DATABASE table; details show us that this clustered index scan sucks disastrous 95% of the query resources:

Actual execution plan for the clustered index scan operator:

As we can see from the statistics, because we have 5 030 000 rows in our D_DATABASE table, we need to scan the whole clustered index:

Actual number of rows to read is 5 030 000.
Estimated I/O cost is 21,1513.
Estimated operator cost is 22,0735.
Estimated CPU cost is 0,922193.
Estimated Number of Rows Per Execution is 743.221.

Now let’s create a thick index with a such column ordering that the most selective column (DATABASE_NAME) is in the beginning of index and so on as follows:

Let’s run the same query again:

Now the query is much faster and consumes significantly less CPU. As you can see, now it uses a nonclustered index scan on this composite index: Details show us that this scan substitutes former clustered index scan and consumes 89% of the query resources. This is better, but not yet optimal, according to the workings of a composite index.

Actual execution plan for the nonclustered index scan operator:

Actual number of rows to read is 5030000.
Estimated I/O cost is 7,30387.
Estimated operator cost is 8,22606.
Estimated CPU cost is 0,922193.
Estimated Number of Rows Per Execution is 743,221.

As can be seen, instead of nonclustered index seek (as we would hope), we need to still scan the whole composite index, even though the I/O cost is now much less than before the index hint. This is because our index key has the columns in inverted order compared to logical hierarchy preferring index seek operations. That’s why we need to scan the whole index (Database – Instance – Server – Domain), wherein the database is the most selective part of the B-tree but still not efficiently constraining the remaining members searched in the logical hierarchy.

So, let’s try to invert the logical index structure into the opposite order: Domain – Server – Instance – Database.

Now let’s fix the thick index with an inverted column ordering:

Then, let’s run the query once again and let’s look at the improved query performance with an optimal composite index. We can see that CPU time and elapsed time are improved big time compared to the earlier (=suboptimal) composite index:

Let’s look at the actual query plan to understand why this happened: Now we have an optimal nonclustered index seek into the D_DATABASE table for this new composite index and yes, that is fast! This explains the blazingly fast query performance.

Actual query plan for the nonclustered index seek:

As we can see from the statistics, now we read only 39 rows instead of 5 030 000 rows, and estimated CPU cost has still significantly decreased from the index scan into index seek. The Estimated Number of Rows Per Execution has dropped to only 13,5936 now. This is due to the optimized B-tree hierarchy, which gives us two benefits:

We always have the covering index whenever we are querying the index hierarchy.
SQL Server engine can seek through the index instead of scanning through it.

Lessons learned

Here are the SQL Server performance results for each above-mentioned use case:

With logical hierarchies, differentiating from the common principles in building composite indexes in SQL Server, a nonclustered composite index performs significantly faster when the index is organized based on the rigid, logical hierarchy rather than the selectivity of the distinct columns. So, if you’d have put a database as first column in the index key, then instance, then server and the last column of the index key would be a domain, you will not only get a low-performing composite index, but also an index, which covers only a handful of efficient use cases: Most queries will not hit the index key at all.

Composite index is an excellent vehicle to hasten multi-column predicate queries in SQL Server database tables. If you want to keep your CPU usage and storage I/O at a low level, I strongly recommend including composite indexes into your toolbox. But remember – logical hierarchies provide you an opportunity to optimize the composite index performance even further on, against the commonly known principles. Don’t put your CPU under pressure with suboptimal indexing!

Interested in SQL Server performance optimization? Read more about SQL Governor software and contact us!

Jani K. Savolainen
CEO & Chairman,
SQL Governor