We had an issue with a supplier table which contained records that were quite similar but still distinct.
For example this:
Supplier ID Supplier Number Company ID Name Address
5 3 COMP1 BLACK AND DECKER UK, London
6 3 COMP1 BLACK AND DECKER United Kingdom, London
7 3 COMP2 BLACK AND DECKER UK, London
8 3 COMP2 BLACK AND DECKER United Kingdom, London
We want to remove records 6 and 8 but keep 5 and 7.
To do this, we wrote the following query:
DELETE from tblSuppliers
where exists (select ‘x’ from tblSuppliers X
where tblSuppliers.Number = X.Number and tblSuppliers.CompanyID = x.CompanyID
and tblSuppliers.SupplierID > X.SupplierID);
Choose one of the following queries to identify or remove duplicate rows from a table leaving only unique records in the table:
SQL> DELETE FROM table_name A WHERE ROWID > ( 2 SELECT min(rowid) FROM table_name B 3 WHERE A.key_values = B.key_values);
Delete all rowids that is BIGGER than the SMALLEST rowid value (for a given key).
SQL> create table table_name2 as select distinct * from table_name1; SQL> drop table table_name1; SQL> rename table_name2 to table_name1;
This method is usually faster. However, remember to recreate all indexes, constraints, triggers, etc. on the table when done.
SQL> delete from my_table t1 SQL> where exists (select 'x' from my_table t2 SQL> where t2.key_value1 = t1.key_value1 SQL> and t2.key_value2 = t1.key_value2 SQL> and t2.rowid > t1.rowid);
Note: One can eliminate N^2 unnecessary operations by creating an index on the joined fields in the inner loop (no need to loop through the entire table on each pass by a record). This will speed-up the deletion process.
Note 2: If you are comparing NOT-NULL columns, use the ISNULL function. Remember that NULL is not equal to NULL. This should not be a problem as all key columns should be NOT NULL by definition.