Hello guys, a good understanding of how the index works and how to use them to improve your SQL query performance is very important while working in a database and SQL and that's why you will find many questions based upon indexes on Programming Job interviews. One of such frequently asked SQL questions is the real difference between table scan, index scan, and index seek? which one is faster and why? How does the database chooses which scan or seek to use? and How you can optimize the performance of your SQL SELECT queries by using this knowledge. In general, there are only two ways in which your query engine retrieves the data, using a table scan or by using an index.
Which method is used for your particular query depends upon what indexes are available in that table, what columns you are requesting in your query, the kind of joins you are doing, and the size of your tables.
If you have a clear understanding of how the index works and how the SQL engine retrieves data from the disk then you can quickly identify performance problems and solve them. That's where most of the SQL developers, especially the Java application developer who write queries and design databases lack.
Btw, if you are not familiar with what is an index and how to create and drop an index, then I suggest you first go through these free SQL and Database courses to learn those basics.
Difference between table scan, index scan, and index seek in Database
In this article, we'll go through each three i.e. table scan, index scan, and index seek, and try to understand how databases process a particular query hence a basic understanding of database, SQL and index is required.1. What is a Table Scan in a database?
A table scan is a pretty straightforward process. When your query engine performs a table scan it starts from the physical beginning of the table and goes through every row in the table. If a row matches the criterion then it includes that into the result set.You might have heard nasty things about table scans but in truth, it's the fastest way to retrieve data especially if your table is quite small. It starts being bad when your table starts growing. You can imagine doing a full table scan in a table with 4 million rows and a full table scan in a table with just 100 rows.
In a small table, a query engine can load all data in just one shot but in a large table, it's not possible, which means more IO and more time to process those data.
Normally, a full table scan is used when your query doesn't have a WHERE clause, I mean, you want more or less every record from a table like the following query will use a full table scan:
SELECT * from Employee;
Btw, If your query is taking too long in a large table then most likely it using either table scan or index scan. You can see that by enabling an execution plan like by doing Ctrl + A in Microsoft SQL Server Management Studio. If you are new to SQL Server, I highly recommend you to join these Microsoft SQL Courses to learn T-SQL and SQL Server Management Studio better.
2. What is the Index Scan in a database?
If your table has a clustered index and you are firing a query that needs all or most of the rows i.e. query without WHERE or HAVING clause, then it uses an index scan. It works similar to the table scan, during the query optimization process, the query optimizer takes a look at the available index and chooses the best one, based on information provided in your joins and where clause, along with the statistical information database keeps.Once the right index is chosen, the SQL Query processor or engine navigates the tree structure to the point of data that matches your criteria and again extracts only the records it needs. See SQL Performance Explained by Markus Winand to learn more about how indexes work in different databases.
The main difference between a full table scan and an index scan is that because data is sorted in the index tree, the query engine knows when it has reached the end of the current it is looking for. It can then send the query, or move on to the next range of data as necessary.
For example, the following query, same as above will use Index scan if you have a clustered index in your table:
SELECT * From Employee;
This is slightly faster than the table scan but considerably slower than an index seek which we'll see in the next section.
3. What is Index Seek in SQL?
When your search criterion matches an index well enough that the index can navigate directly to a particular point in your data, that's called an index seek. It is the fastest way to retrieve data in a database. The index seeks are also a great sign that your indexes are being properly used.This happens when you specify a condition in WHERE clause like searching an employee by id or name if you have a respective index.
For example, the following query will use an index seek, you can also confirm that by checking the execution plan of this query when you run this on SQL server:
SELECT * from Employee where EmployeeId=3;
In this case, the Query Optimizer can use an index to directly go to the third employee and retrieve the data. If you look at the execution plan shown below, you can see that it uses an index seek using the index created on EmployeeId.
If you want to learn more about how the SQL engine process requests I suggest you go through the SQL Tuning course by Amarnath Reddy from Udemy/ No you will not only learn more about SQL performance tuning but also learn SQL performance tuning tools that can help you with the task. I highly recommend this course to anyone who wants to improve their SQL and Database performance skills
Difference between table scan, index scan, and index seek in SQL
Based upon our understanding of indexes, you can now deduce the following points to summarize the difference between table scan, index scan, and index seek in a database:1) A table scan and an index scan are used when you need to retrieve all data like 90% to 100% while index seek is used when you need to retrieve data based upon some conditions like 10% of data.
2) If your query doesn't have a WHERE clause and your table doesn't have a clustered index then a full table scan is used, if it does have a clustered index then an index scan is used.
3) index scan is faster than a table scan because they look at sorted data and query optimizers know when to stop and look for another range.
4) index seek is the fastest way to retrieve data and it comes into the picture when your search criterion is very specific. Normally, when you have a WHERE clause in your query and you are using a column that also has an index, then index seek is used to retrieve data as shown in the following query:
select * from Employee where Id= 3;
You can confirm that by actually looking at the execution plan for your query.
In MSSQL management studio, you can see the execution plan by clicking Ctrl + A and then running your query. If you want to learn more SQL Server Performance Tuning Part 1 by Raphel Ashgar is a good starting point.
That's all about the difference between table scan, index scan and index seek in a database. As I told you, there are only two ways to retrieve data in a database either by using a table scan or by using an index. The latter is faster in the case of large tables. The choice of the index depends upon multiple things like the WHERE clause and joins in your table, the columns you are requesting, the size of tables, etc.
If you feel that your query is slow, you must check the execution plan to confirm whether it's using index seeks or index scan, or table scan. Then you can optimize your query by introducing the right index or tuning your query.
Other Useful Programming Resources You May Like:
- Top 5 Courses to learn SQL and Database Fundamentals
- The Java Developer RoadMap
- Top 5 Courses to learn MySQL Database
- Top 5 Books for Database Design and Modeling
- A Better way to write SQL queries
- 10 Things to Remember While running SQL queries on Production
- A Developer's list of must-read SQL books
- Top 4 Books to learn Oracle and PL/SQL
- 10 Things Every Programmer Should Learn
- Top 5 Advanced SQL Books for Programmers
- 5 Free T-SQL and SQL Server Courses for Programmers and DBAs
- 11 Things Java Developer Should learn
- Top 5 Courses to learn PostgreSQL
Thanks for reading this article so far. If you like this article then please share it with your friends and colleagues. If you have any questions or feedback then please drop a comment.
P. S. - If you are new to SQL and Database in general and want to learn DB fundamentals and SQL basics before deep dive on SQL Performance Tuning then you can also check out this list of free Websites to learn SQL and Database for beginners.
No comments:
Post a Comment