Removing duplicate records from a table where the rows are similar but still distinct.

We had an issue with a supplier table which contained records that were quite similar but still distinct.

For example this:

Supplier ID                 Supplier Number         Company ID      Name                                        Address

5                                          3                                      COMP1                BLACK AND DECKER          UK, London

6                                          3                                      COMP1                BLACK AND DECKER          United Kingdom, London

7                                          3                                      COMP2               BLACK AND DECKER          UK, London

8                                          3                                      COMP2               BLACK AND DECKER           United Kingdom, London

 

We want to remove records 6 and 8 but keep 5 and 7.

To do this, we wrote the following query:

DELETE from tblSuppliers
where exists (select ‘x’ from tblSuppliers X
where tblSuppliers.Number = X.Number and tblSuppliers.CompanyID = x.CompanyID
and tblSuppliers.SupplierID > X.SupplierID);

Done!

 

Other Methods

Choose one of the following queries to identify or remove duplicate rows from a table leaving only unique records in the table:

Method 1:

SQL> DELETE FROM table_name A WHERE ROWID > (
  2    SELECT min(rowid) FROM table_name B
  3    WHERE A.key_values = B.key_values);

Delete all rowids that is BIGGER than the SMALLEST rowid value (for a given key).

Method 2:

SQL> create table table_name2 as select distinct * from table_name1;
SQL> drop table table_name1;
SQL> rename table_name2 to table_name1;

This method is usually faster. However, remember to recreate all indexes, constraints, triggers, etc. on the table when done.

Method 3 (similar to the one we used):

SQL> delete from my_table t1
SQL> where  exists (select 'x' from my_table t2
SQL>                 where t2.key_value1 = t1.key_value1
SQL>                   and t2.key_value2 = t1.key_value2
SQL>                   and t2.rowid      > t1.rowid);

Note: One can eliminate N^2 unnecessary operations by creating an index on the joined fields in the inner loop (no need to loop through the entire table on each pass by a record). This will speed-up the deletion process.

Note 2: If you are comparing NOT-NULL columns, use the ISNULL function. Remember that NULL is not equal to NULL. This should not be a problem as all key columns should be NOT NULL by definition.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s