Indexing in SQL Server

ARSLANOV
8 min readJun 4, 2023

YouTube: Internals of indexing in SQL Server: All you need to know!!! — YouTube

Hello everyone. Welcome to my blog.
In this blog, I am going to talk about indexing in SQL Server. To be more specific, I will share what is index and why we need them with specific examples.
Afterwards, I will talk about index types in sql server including clustered and non-clustered index.
Finally, I will end this video by talking about covering and include indexes.
If you find this video useful, please do not forget to subscribe to follow my future blog series.
Let us get started.

First of all, what is index?
- An index is a “a feature that improves the performance of searches” and in this way the performance of the queries in SQL Server. That is it.
- We can say this is similar to the index of a book where page numbers are mapped to the key words. A book index helps you to quickly go to the page containing the word you are searching for.
If there is no index, you may have to rely on your memory, intuition or you may have to search the word by going through the each page in the book. This is very time consuming.
- The same applies to the tables and indexing in SQL Server. Indexes allow SQL Server to quickly search for data in a table.

Without indexes, SQL Server has to search through all the data, which can be very time-consuming process. Let us see how it happens by looking at internal behavior of SQL Server when there is no index.

Consider the following ‘Employees’ table as an example where there is no index.

Let us run SELECT query to fine an employee with First name “James”.

SELECT * FROM [Person].[Person] WHERE FirstName = ‘James’

SQL Server does not know where this data is located. Therefore, SQL Server has to search for the data by going through each row in the table.

For example, let us illustrate this process.

SQL Server searches a data with first name until row 25. Found one value matching the query. Then the search will continue until row 42. Then, again with the same process until the end of the table. This process happens with every table without an index and it is very time consuming.

Let us see what happens if we do have an index.

If we have an index for the FirstName column, SQL Server does not need to search for the whole table but directly find the records.

This is very efficient, right?

Internal architecture of the databases

Ok, we understood the importance of the indexing in SQL Server queries.
Now, let us deep dive and understand how indexes are built by looking at internal structures of the index.
But beforehand, to understand indexing, it is important to understand the internal structure of the database. Therefore, Let’s start by understanding the internal structure of the database before moving to indexes.
Each database in SQL Server contains at least one data file and transaction log file. Each datafile contains 8KB sized data pages.

SQL Server extracts data from disk at page level. Each page contains the actual data and the group of pages are located in heap. In other words, heap is a group of unclustered pages.
I want to also mention that a collection of 8 pages is called an extent. When data is added, one extent is allocated, subsequently added data is stored on eight consecutive pages.
This ensures that the data for the same object is stored on eight consecutive pages. Storing data of the same object helps to improve reading and searching performance.

Internal structure of the index: non-clustered index

Now, we understand the structure of database. Let us switch to the structure of indexes.
There are two types of indexes in SQL Server: clustered and non-clustered index. I will start with non-clustered index.
First, index structure is architecture is based on the tree structure.
First level is called root node and contains one root page.
After root node, here comes middle nodes. Middle node may consist of several levels based on the data. In our case it consists of only one level since we have a small data.
After middle node, lastly comes leaf node where index architecture ends.

As you can see, in each level data is sorted based on the first column which first name. The next column is lower page ids in root node and middle nodes.
In the leaf node however, it becomes ROW id: first number is file id, second one is heap page id and third one is row number in page.

Let us see how the flow of our query in this architecture.

We run this query.

SELECT *
FROM [Person].[Person]
WHERE FirstName = ‘James'

First, the flow starts with root page and SQL Server defines JACK and Murphy rows there. SQL Server understands that James is between these rows. Therefore, it selects page number 33 and then it goes to page number 33.

There, SQL Server finds Jack and Laren. It understands that James should be in 169 since next row is already starting with letter “L”. As a result, page number 169 is selected, and James's row is found there.

Here we find that James is located in File numbered 1, Page numbered 402 and in the first row of heap or heap table we can say. Then the result is returned.

These all the operations we just saw is called non-clustered index seek with row id lookup in SQL Server.

Cluster existence non-working cases

Sometimes, even if you create non-clustered index, you might notice that this index is not being used. Instead, Table scan is being used.

When you face this issue:

  1. When indexed column is not used in query’s WHERE clause. You should always use indexed column in WHERE clause if you want SQL Server to implement indexes in the query.
  2. When SQL server finds that Table scan or non-clustered index scan is faster than non-clustered index seek with RID lookup. We already know what table scan is. Table scan is searching for the record by going through all the records in the table while non-clustered index scan is scanning the whole leaf node.

Indexes are most effective when retrieving one to a few hundred data items from a large amount of data. If there is a large amount of relevant data, the RID Lookup might be costly, and less efficient than a table scan.

Clustered index

OK, so far, we have discussed ‘non-clustered indexes’, and now we will discuss another type of index, clustered indexes.

The difference between a non-clustered index and a clustered index is that in a non-clustered index, as you might remember, a pointer to the actual data (row identifier: RID) is stored at leaf level.

But in clustered index the actual data itself is stored. For example, we created a clustered index based on the FirstName column. As you can see, the leaf node includes the actual data pages which is sorted based on the column “FirstName”.

This shows there are some important points regarding clustered index. Firstly, we cannot create more than one clustered index for one table. Furthermore, when we create clustered index, the actual data is sorted in pages in leaf node. Therefore, creating clustered index takes more time than non-clustered index. Moreover, When sorting is happening, we should store sorting data temporarily in temp table. Therefore, we should have enough storage for creating clustered index.

OK, so far, we discussed about heap, clustered index and non-clustered index.

In the case of heap table structure, we have table scan only. In the case of indexing, we have clustered index scan and non-clustered index scan where the whole leaf node is scanned. The most efficient way we found is clustered index seek and non-clustered index seek where only required part of the table is scanned.

Automatic creation of the indexes

Until here we came by creating indexes manually. However, clustered index can also be created automatically. When?

  1. When table is created with primary key
  2. When Table is created with Unique column

As you can see, I created tables with primary and unique keys and tables have been created with clustered indexes.

Non-clustered + clustered index case structure

Ok. Let us discuss about a case where we have both clustered and non-clustered indexes created. We will end up with the following flow.

In this case, as you can see, in the leaf node of the non-clustered index, BusinessEntityId has been set instead of Row ids.

Thank you for your time!!!!!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

ARSLANOV
ARSLANOV

Written by ARSLANOV

SQL Server DBA at Microsoft

No responses yet

Write a response