如何删除SQL中的重复行
如何删除SQL中的重复行? 在本节中,我们将学习在MySQL和Oracle中删除重复行的不同方法。如果SQL表…
如何删除SQL中的重复行?
在本节中,我们将学习在MySQL和Oracle中删除重复行的不同方法。如果SQL表包含重复的行,那么我们必须删除重复的行。
准备样品数据
该脚本将创建名为contacts的表。
DROP TABLE IF EXISTS contacts; CREATE TABLE contacts ( id INT PRIMARY KEY AUTO_INCREMENT, first_name VARCHAR(30) NOT NULL, last_name VARCHAR(25) NOT NULL, email VARCHAR(210) NOT NULL, age VARCHAR(22) NOT NULL );
在上表中,我们插入了以下数据。
INSERT INTO contacts (first_name,last_name,email,age) VALUES ('Kavin','Peterson','[email protected]','21'), ('Nick','Jonas','[email protected]','18'), ('Peter','Heaven','[email protected]','23'), ('Michal','Jackson','[email protected]','22'), ('Sean','Bean','[email protected]','23'), ('Tom ','Baker','[email protected]','20'), ('Ben','Barnes','[email protected]','17'), ('Mischa ','Barton','[email protected]','18'), ('Sean','Bean','[email protected]','16'), ('Eliza','Bennett','[email protected]','25'), ('Michal','Krane','[email protected]','25'), ('Peter','Heaven','[email protected]','20'), ('Brian','Blessed','[email protected]','20'); ('Kavin','Peterson','[email protected]','30'),
在执行DELETE语句后,我们将执行脚本以重新创建测试数据。
该查询从联系人表返回数据:
SELECT * FROM contacts ORDER BY email;
id | first_name | last_name | age | |
7 | Ben | Barnes | [email protected] | 21 |
13 | Brian | Blessed | [email protected] | 18 |
10 | Eliza | Bennett | [email protected] | 23 |
1 | Kavin | Peterson | [email protected] | 22 |
14 | Kavin | Peterson | [email protected] | 23 |
8 | Mischa | Barton | [email protected] | 20 |
11 | Michal | Krane | [email protected] | 17 |
4 | Michal | Jackson | [email protected] | 18 |
2 | Nick | Jonas | [email protected] | 16 |
3 | Peter | Heaven | [email protected] | 25 |
12 | Peter | Heaven | [email protected] | 25 |
5 | Sean | Bean | [email protected] | 20 |
9 | Sean | Bean | [email protected] | 20 |
6 | Tom | Baker | [email protected] | 30 |
以下SQL查询从联系人表返回重复的电子邮件:
SELECT email, COUNT(email) FROM contacts GROUP BY email HAVING COUNT (email) > 1;
COUNT(email) | |
[email protected] | 2 |
[email protected] | 2 |
[email protected] | 2 |
我们有三行重复的电子邮件。
(A)使用DELETE JOIN语句删除重复的行
DELETE t1 FROM contacts t1
INNERJOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;
输出:
Query OK, three rows affected (0.10 sec)
三行已被删除。我们执行下面给出的查询,以从表中查找重复的电子邮件。
SELECT email, COUNT (email) FROM contacts GROUP BY email HAVING COUNT (email) > 1;
查询返回空集。要验证联系人表中的数据,请执行以下SQL查询:
SELECT * FROM contacts;
id | first_name | last_name | age | |
7 | Ben | Barnes | [email protected] | 21 |
13 | Brian | Blessed | [email protected] | 18 |
10 | Eliza | Bennett | [email protected] | 23 |
1 | Kavin | Peterson | [email protected] | 22 |
8 | Mischa | Barton | [email protected] | 20 |
11 | Micha | Krane | [email protected] | 17 |
4 | Michal | Jackson | [email protected] | 18 |
2 | Nick | Jonas | [email protected] | 16 |
3 | Peter | Heaven | [email protected] | 25 |
5 | Sean | Bean | [email protected] | 20 |
6 | Tom | Baker | [email protected] | 30 |
行ID的9、12和14已被删除。我们使用以下语句删除重复的行:
执行用于创建联系人的脚本。
DELETE c1 FROM contacts c1 INNERJ OIN contacts c2 WHERE c1.id > c2.id AND c1.email = c2.email;
id | first_name | last_name | age | |
1 | Ben | Barnes | [email protected] | 21 |
2 | Kavin | Peterson | [email protected] | 22 |
3 | Brian | Blessed | [email protected]o.com | 18 |
4 | Nick | Jonas | [email protected] | 16 |
5 | Michal | Krane | [email protected] | 17 |
6 | Eliza | Bennett | [email protected] | 23 |
7 | Michal | Jackson | [email protected] | 18 |
8 | Sean | Bean | [email protected] | 20 |
9 | Mischa | Barton | [email protected] | 20 |
10 | Peter | Heaven | [email protected] | 25 |
11 | Tom | Baker | [email protected] | 30 |
(B)使用中间表删除重复的行
要使用中间表删除重复的行,请按照以下步骤操作:
步骤1.创建一个新表结构,与真实表相同:
CREATE TABLE source_copy LIKE source;
步骤2.插入数据库原始计划中的不同行:
INSERT INTO source_copy SELECT * FROM source GROUP BY col;
步骤3.删除原始表,并将立即表重命名为原始表。
DROP TABLE source; ALTER TABLE source_copy RENAME TO source;
例如,以下语句从联系人表中删除具有重复电子邮件的行:
-- step 1 CREATE TABLE contacts_temp LIKE contacts; -- step 2 INSERT INTO contacts_temp SELECT * FROM contacts GROUP BY email; -- step 3 DROP TABLE contacts; ALTER TABLE contacts_temp RENAME TO contacts;
(C)使用ROW_NUMBER()函数删除重复的行
注意:自MySQL 8.02版以来,已支持ROW_NUMBER()函数,因此我们应在使用该函数之前检查MySQL版本。
以下语句使用ROW_NUMBER()为每个行分配一个顺序整数。如果电子邮件重复,则该行将大于一。
SELECT id, email, ROW_NUMBER() OVER (PARTITION BY email ORDER BY email ) AS row_num FROM contacts;
以下SQL查询返回重复行的ID列表:
SELECT id FROM (SELECT id, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY email) AS row_num FROM contacts ) t WHERE row_num> 1;
输出:
id |
9 |
12 |
14 |
删除Oracle中的重复记录
当我们在表中找到重复的记录时,我们必须删除不需要的副本,以保持数据的干净唯一。如果表中有重复的行,我们可以使用DELETE语句将其删除。
在这种情况下,我们有一列,它不是用于评估表中重复记录的组的一部分。
考虑下面给出的表:
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
01 | Potato | Brown |
02 | Potato | Brown |
03 | Onion | Red |
04 | Onion | Red |
05 | Onion | Red |
06 | Pumpkin | Green |
07 | Pumpkin | Yellow |
-- create the vegetable table CREATE TABLE vegetables ( VEGETABLE_ID NUMBER generated BY DEFAULT AS ID ENTITY, VEGETABLE_NAME VARCHAR2(100), color VARCHAR2(20), PRIMARY KEY (VEGETABLE_ID) );
-- insert sample rows INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown'); INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown'); INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red'); INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red'); INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red'); INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Green'); INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Yellow');
-- query data from the vegetable table SELECT * FROM vegetables;
假设我们要保留具有最高VEGETABLE_ID的行,并删除所有其他副本。
SELECT MAX (VEGETABLE_ID) FROM vegetables GROUP BY VEGETABLE_NAME, color ORDER BY MAX(VEGETABLE_ID);
MAX(VEGETABLE_ID) |
2 |
5 |
6 |
7 |
我们使用DELETE语句删除VEGETABLE_ID COLUMN中的值不是最高的行。
DELETE FROM vegetables WHERE VEGETABLE_IDNOTIN ( SELECT MAX(VEGETABLE_ID) FROM vegetables GROUP BY VEGETABLE_NAME, color );
三行已被删除。
SELECT *FROM vegetables;
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
02 | Potato | Brown |
05 | Onion | Red |
06 | Pumpkin | Green |
07 | Yellow |
如果我们想让ID最小的行,请使用MIN()函数而不是MAX()函数。
DELETE FROM vegetables WHERE VEGETABLE_IDNOTIN ( SELECT MIN(VEGETABLE_ID) FROM vegetables GROUP BY VEGETABLE_NAME, color );
如果我们有一个不属于评估重复项的组的列,则上述方法有效。如果列中的所有值都有副本,那么我们将无法使用VEGETABLE_ID列。
让我们拖放并创建一个具有新结构的蔬菜表。
DROP TABLE vegetables; CREATE TABLE vegetables ( VEGETABLE_ID NUMBER, VEGETABLE_NAME VARCHAR2(100), Color VARCHAR2(20) );
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1,'Potato','Brown'); INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1, 'Potato','Brown'); INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red'); INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red'); INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(2,'Onion','Red'); INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(3,'Pumpkin','Green'); INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES('4,Pumpkin','Yellow'); SELECT * FROM vegetables;
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
01 | Potato | Brown |
01 | Potato | Brown |
02 | Onion | Red |
02 | Onion | Red |
02 | Onion | Red |
03 | Pumpkin | Green |
04 | Pumpkin | Yellow |
在蔬菜表中,已复制所有列VEGETABLE_ID,VEGETABLE_NAME和颜色中的值。
我们可以使用rowid,这是一个指定Oracle在哪里存储行的定位器。因为rowid是唯一的,所以我们可以使用它来删除重复的行。
DELETE FROM Vegetables WHERE rowed NOT IN ( SELECT MIN(rowid) FROM vegetables GROUP BY VEGETABLE_ID, VEGETABLE_NAME, color );
该查询验证删除操作:
SELECT * FROM vegetables;
VEGETABLE_ID | VEGETABLE_NAME | COLOR |
01 | Potato | Brown |
02 | Onion | Red |
03 | Pumpkin | Green |
04 | Pumpkin | Yellow |
本文收集自互联网,转载请注明来源。
如有侵权,请联系 wper_net@163.com 删除。
还没有任何评论,赶紧来占个楼吧!