This is also a follow-up SQL interview question from, how to find duplicates in a given table in SQL which we have seen earlier.
1) copy unique rows into temp table
2) delete or truncate the table with duplicate rows
3) copy unique rows from temp to original table
Here is the SQL query to remove duplicates from a table in Microsoft SQL SERVER :
You can also use truncate in place of delete, because we are actually clearing the table.
Here we are using window function ROW_NUMBER(), to assign row number to each record. Since we are using PARTITION BY letter, position the row number will reset as soon as any of them will change.
Even though, we will use Microsoft SQL Server database in our example, most of the solution will run just fine on other database like Oracle and MySQL. Some might require little bit of tweaking but until you know the steps, you can rewrite the SQL query for your database.
In this article, we will see 2 ways to remove duplicate rows from a given table in SQL
In this article, we will see 2 ways to remove duplicate rows from a given table in SQL
1. By using temp table and SELECT INTO command
2. By using ROW_NUMBER clause
Removing duplicate rows in SQL Server using temp table
This is the easiest solution for small tables but also works fine for large tables. This solution has three steps :1) copy unique rows into temp table
2) delete or truncate the table with duplicate rows
3) copy unique rows from temp to original table
Here is the SQL query to remove duplicates from a table in Microsoft SQL SERVER :
select distinct * into #Uniques from #Letters
delete #Letters
insert into #Letters select * from #Uniques
You can also use truncate in place of delete, because we are actually clearing the table.
Remember, we are not changing the structure of data it just data, which we are copying back and forth. I have also tested this code in Microsoft SQL Server using SQL Server Management Studio, my favorite SQL IDE, and it works fine, but given its not using any special SQL Sever commands or features, I think it will also work in Oracle, MySQL and PostgreSQL.
If you want, you can also test this in online SQL practice sites like sqlfiddle.com, in fact I will show you how to test that soon as we have complete SQL script with us which you can copy and paste in your database to setup the table with data and start testing yourself.
Here is the complete example to demonstrate this problem and solution :
Now, let's just copy and paste this code in Sqlfiddle.com and find out whether it work or not in both SQL Server and other databases. If you look closely the #Letters is a temp table I have created for testing and it contains three duplicates, three rows with letter A and two rows with letter B.
If you want, you can also test this in online SQL practice sites like sqlfiddle.com, in fact I will show you how to test that soon as we have complete SQL script with us which you can copy and paste in your database to setup the table with data and start testing yourself.
Here is the complete example to demonstrate this problem and solution :
IF OBJECT_ID( 'tempdb..#Letters' ) IS NOT NULL DROP TABLE #Letters; CREATE TABLE #Letters (letter varchar(1), position int); INSERT INTO #Letters VALUES ('A', 1); INSERT INTO #Letters VALUES ('A', 1); -- duplicate INSERT INTO #Letters VALUES ('A', 1); -- duplicate INSERT INTO #Letters VALUES ('A', 2); INSERT INTO #Letters VALUES ('B', 1); INSERT INTO #Letters VALUES ('B', 2); INSERT INTO #Letters VALUES ('B', 2); -- duplicate INSERT INTO #Letters VALUES ('C', 1); INSERT INTO #Letters VALUES ('C', 2); SELECT * FROM #Letters ORDER BY letter, position; -- remove duplicate rows if all fields are identical select distinct * into #Uniques from #Letters delete #Letters insert into #Letters select * from #Uniques SELECT * FROM #Letters ORDER BY letter, position;
Now, let's just copy and paste this code in Sqlfiddle.com and find out whether it work or not in both SQL Server and other databases. If you look closely the #Letters is a temp table I have created for testing and it contains three duplicates, three rows with letter A and two rows with letter B.
We are copy and pasting unique rows into another temp table #Uniques and then deleting all records from #Letters and then copying the unique row back to #Letters.
Which means if we run the SELECT command again on #Letters, we should not see any duplicates:
Here is the output of above script when I ran it on SQL server database on SQLFiddle.com
letter position
------ -----------
A 1
A 1
A 1
A 2
B 1
B 2
B 2
C 1
C 2
letter position
------ -----------
A 1
A 2
B 1
B 2
C 1
C 2
You can clearly see that first output has duplicate rows but second output doesn't have any duplicate rows which means we have cleaned them up.
Here is the screenshot which shows that we are using SQL Server database:
Now, let's run the same SQL command into MySQL database to see if it can remove duplicates on a MySQL table or not:
It seems the code doesn't work in MySQL as we are getting below error:
ERROR 1064 (42000) at
line 1: You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near 'IF OBJECT_ID( 'tempdb..#Letters' ) IS NOT NULL
DROP TABLE
CREATE TABLE
INS' at line 1
It's because we have used T-SQL code which is not supported in MySQL, let's try without that as that code is only checking for duplicate table.
Now we are getting another error:
ERROR 1064 (42000) at line 1: You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the
right syntax to use near 'INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INT' at line 3
It seems MySQL doesn't support SELECT INTO command and its giving error on following code
select distinct * into #Uniques from #Letter
So, its not working, no problem, we will solve that into another post, let's see another way to remove duplicate rows into SQL Server database
Solution 2 - removing duplicates using ROW_NUMBER() and common table expression
You can also remove duplicates from a table by using following SQL query which uses common table expression :;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY letter, position
ORDER BY ( SELECT 0)) RN
FROM #Letters)
DELETE FROM cte
WHERE RN > 1
Here we are using window function ROW_NUMBER(), to assign row number to each record. Since we are using PARTITION BY letter, position the row number will reset as soon as any of them will change.
For example, in our table we have these three records which are exactly same, only one of them will be chosen and will be assigned RN = 1 , as shown below :
It's mandatory to use ORDER BY clause with a ranking function like ROW_NUMBER() in SQL Server .
SELECT *, ROW_NUMBER() OVER (PARTITION BY letter, position
ORDER BY ( SELECT 0)) RN
FROM #Letters
letter position RN
A 1 1
A 2 1
B 1 1
B 2 1
C 1 1
C 2 1
It's mandatory to use ORDER BY clause with a ranking function like ROW_NUMBER() in SQL Server .
Here I am using ORDER BY (SELECT 0) as you can preserver any row in the event of a tie. If you want to preserve latest one then you can also do something like ORDER BY position desc.
The beauty of this solution is that it will work just fine for large tables as well, but this will not work in MySQL, Oracle or PostgreSQL database. Sybase may be but I haven't tested on it.
That's all about how to remove duplicate rows from a table in SQL SERVER. As I told you, the SQL query is tested on SQL Server but most likely work in Oracle and MySQL as well. If it doesn't just let me know and I'll work with you to fix that.
The beauty of this solution is that it will work just fine for large tables as well, but this will not work in MySQL, Oracle or PostgreSQL database. Sybase may be but I haven't tested on it.
That's all about how to remove duplicate rows from a table in SQL SERVER. As I told you, the SQL query is tested on SQL Server but most likely work in Oracle and MySQL as well. If it doesn't just let me know and I'll work with you to fix that.
Just remember the steps, copy the unique data to temp table, truncate original table and then copy the unique data back to the original table.
Don't forget to drop the temp table, or if you want to keep until next run, just leave it.
Other SQL Tutorials and Articles you may like
- How to find top 10 Records in SQL
- 6 Examples of Correlated subqueries for beginners
- How to use WHERE and HAVING clause in SQL
- How to convert String to Date in SQL?
- How to use Stored Procedure in SQL?
- Does order of columns matter in Composite index?
- Difference between Self and Equi join in SQL
- 10 Example of SELECT queries in SQL
- How to create and rollback transaction in database?
- 4 Examples of CASE expression in SQL Server
- MySQL vs NoSQL comparison
- Difference between UNION and UNION ALL in SQL
- 10 Examples of ALTER command in SQL
If you want to practice the SQL, you can also use this link on SQLFiddle where my queries are saved https://sqlfiddle.com/mysql/online-compiler?id=d33af95f-4ae9-423d-8164-c8a8ceaad045
ReplyDelete