Tuesday, September 3, 2024

Difference between UNION and UNION ALL in SQL Server?

 Hello guys, what is the difference between UNION vs UNION ALL is one of the most popular SQL interview questions and often asked programmers during a telephonic round of interviews. Though both UNION and UNION ALL is used to combine results of two SELECT queries, the main difference between them is that UNION doesn't include duplicate record but UNION ALL does. Another difference between them is that UNION ALL is faster than UNION but may look slow because it returns more data which takes more time to travel via the network. The difference between UNION and UNION ALL can be a tricky SQL question, especially for developers, who have not used this useful keyword ever. 


Since the UNION clause is not as common as a SELECT clause in SQL, it's usually asked in a telephonic round of programming interviews to check whether the candidate is comfortable with SQL or not. It's in the same league of questions like clustered vs non-clustered index or primary vs unique keyUNION is very different than other SQL commands because it operates on data rather than columns.

Anyway, the answer to this question is simple, though both UNION and UNION ALL are used to combine the result of two separate SQL queries on the same or different table,  UNION does not keep a duplicate record (a row is considered duplicate if the value of all columns is same), while UNION ALL does.



Since you mostly don't want duplicate rows,  UNION is preferred over UNION ALL in reporting and application development. By the way, you should keep in mind that UNION ALL performance better than UNION because it doesn't have to remove duplicates, so no extra work.

This keyword is very well supported by all major databases like OracleMicrosoft SQL ServerMySQL, and PostgreSQL. Another thing to keep in mind is the amount of data returned by UNION ALL; if your database server is quite far away and you have limited bandwidth, UNION ALL may appear slower than UNION because of the number of duplicates it returned.

The cost of transferring duplicate rows can exceed the query execution benefits in many cases. We will see a couple of examples UNION and UNION ALL in SQL, but before that few things to keep in mind. In order to combine the results of two queries, they must contain the same number of columns.

For example, if one query contains 3 columns and the other contains 4 columns then you can not use UNION or UNION ALL. This is because a row will only be considered duplicated when all columns will have the same value, irrespective of the name of the columns themselves.





UNION and UNION ALL Example in Microsoft SQL Server

Let's see one simple example of UNION and UNION ALL, this will not only show you how they work but also where you can use them. This example is from my sample database and the following screenshot is from SQL Server Management Studio.

 We have two tables, Employee and Customer. In order to use UNION and UNION ALL, I have kept the same persons as employee and customer, so you will see the same id on emp_id and customer_id, and the same name as well. 

If you look at the result of the first two select queries, you will see that the first query returns two rows and the second query returns three rows, where two rows have exactly the same data as the first query. 

Key things to note is that column names are different in both result sets, first one has emp_id and emp_name, while second data set has customer_id and customer_name, but most important both dataset has only two columns. This is a must in order to combine them using UNION and UNION ALL keywords. 

The third query is an example of how to use the UNION clause in SQL, you can see that the combined result has just three columns, all are unique. Duplicate columns from the second result set were not included. This is more like how you do UNION in Set theory, where the final result contains data from both sets. 

The fourth query is how you should use UNION ALL, it contains five rows, two from the first query and three from the second query. It has not removed duplicate rows from the second query, that's why you see Ken and Bob repeating twice. 

This example teaches us the core concept that the UNION doesn't depend upon the column name but the data. You can combine the result of as many queries as possible until the number of columns in all of them is the same and the data is from the same set.

Regarding performance, you need to run UNION and UNION ALL with a large database, containing millions of rows. There you can monitor how much time both takes and compare them. 

Theoretically, UNION ALL should take less time to execute but more time to transfer data to the client. 


Difference between UNION ALL and UNION in SQL SERVER



Difference between UNION and UNION ALL command in SQL

Now we know how union and union all works and has some background by following the above examples, let's summaries the similarities and difference between them for quick revision :

1. Combining Results

Both UNION and UNION ALL are used to combine the results of two separate SQL queries, it could be on the same table or a different table but the data should be the same. For example, if product_id is used in two tables like Product and Order, then two SQL queries which pulls product_id from these two tables can be combined using UNION or UNION ALL.

2. Duplicates

The key difference between UNION and UNION ALL is that the former will remove duplicates but later will keep them. In other words, UNION is equal to running distinct on the output of UNION ALL. For example, if product_id 10 is returned by both SQL query then it will only appear once if you use UNION and appear twice if you use UNION ALL.

3. Execution time

Due to the above difference query execution time of UNION ALL is smaller than UNION, which means the former runs faster than the latter. So if you want faster output and don't care about duplicates use UNION ALL

This is something you can deduce from your existing SQL knowledge and that's where working on fundamentals pays off. 

4. Speed and Bandwidth Usage

You should keep in mind that benefits gained by not removing duplicates can be easily wiped out by transferring more data over a poor bandwidth network connection. 

That's why in practice some time UNION ALL appears slower than UNION because it returns a lot of data with duplicates which require more time to travel from database server to client machine. 

To evaluate the performance of UNION and UNION ALL case by case.




5. Number of Columns on ResultSet

Another worth noting thing while using UNION and UNION ALL is that all queries combined using a UNIONINTERSECT, or EXCEPT operator must have an equal number of expressions in their target lists. 

For example, if the result of query 1 has three columns and the result of query 2 has two columns then you cannot combine them using the UNION command.


That's all on the difference between the UNION and UNION ALL command in SQL. It's one of the useful commands to combine the result of two SELECT queries when they contain the same data.

There are many practical scenarios where UNION is very useful, for example when you need to create a list out of different tables containing data from the same set. 

The main difference between UNION and UNION ALL is about duplicates, the former removes it while later keeps it, other differences between them on performance and networking bandwidth usage can be easily derived by knowing this difference. 

Also keep in mind that it is well supported big three databases like MySQL, Oracle, and SQL Server. Let us know if you have been asked this question in your SQL interview.

No comments:

Post a Comment