I have a column A has lots of different number , I need to find out which one is duplicate , highlight it then delete it according my choose , here is some methods I am testing :
Method A : Check if a cell appear in last all cells above .
1. Add a B column and place this formula in B2 : =countif(A$1:A1;A2)>0 , Copy the formula for all B cells that are adjacent to A cells containing data. A TRUE value in the B column tells you that the value in the A column is a duplicate.
That means : if A10 found A1 – A9 has duplicate , it count + 1 , if the value > 0 , the system tell you True , Or False .
2. Use condition format , if TRUE , highlight the background . You can see some color are there , those are duplicate items in Column A.
3. You have to use Ctrl+F , to copy and find where is the duplicate item appears , then delete it .
My comments :
This is a good method to solve small quantity data , but I have 10 thousands line to check , and each line has to check above again . you can imagine , in Line 9999 , he must check countif() in 1 – 9998 , it might be slow because there are too much repeat calculation to evaluate if it is repeat .
So I think this is a slow method to solve big quantity column data .
Method B: Sort the column from A – Z , then check last cell’s if duplicate .
1. Sort column A from A – Z,
2. In column B2 , input : =IF(A2=A1, “Duplicate”, “”)
3. All duplicate cell in Column A , display text “Duplicate” in Column B , and they list side by side .
4. Find duplicate easier then delete according your choose .
My comments :
This is a good method to solve big quantity data duplicate issue , because each data just compare the one above it , but another issue is : It might be long time to sort column A from A to Z , especially when you are sharing this sheet to other partners who is inputting data ! ( Unlucky that I am in this situation .)
Of course I know the UNIQUE formula to solve automatically , but it won’t let me choose delete which row . See below Gif :
Click an empty cell, type =UNIQUE(, select the range of cells you want to filter and thenclose the parenthesis. Simple.
Step 1: Assuming that our original data is in columns A, B & C, go to cell D1 and write a formula to concatenate the data in three columns. We’ll use a pipe separator to distinguish between a row like “a1, b, c” and “a, 1b, c”.
=CONCATENATE(A1, “|”, B1, “|”, C1) – drag the cell handle to fill the formula in other cells.
Step 2: Sort the D column by clicking the header as shown in the screencast.
Step 3: We now need a function to compare two adjacent values in column D. If the values are same, one of them is a duplicate for sure. Type this in E2:
=IF(D2=D1, “Duplicate”, “”) – drag to fill all cells until E5.
That’s it. All rows in column E that have value “Duplicate” are duplicate rows.
My comments :
Can not choose which one is good to delete , not what I want .
Finally , A fantasy Script comes !!!!
Very easy , just in Column B1, write down : =arrayformula(countif(if(A:A<>””,A:A),A:A)>1) ———– note : be ware of space in the front and bottom , it might cause not work .
Then use conditional formatting , if True highlight as yellow color , see attachment ,
Now you can consider sort column A or not .
My comments :
Very easy to use , fast , accurate , optional of future control , no need to sort it before finding , able to sort after checking , you can choose if need delete . Because data list beside .
Another thing is , once you wrote this formula and highlight formatting , it is 100% automatically check, whether you insert further data inside or not .
More !!!
Another guy write a scripts here to solve , but it seems hard to understand and not easy to manual delete .
https://developers.google.com/apps-script/articles/removing_duplicates
Thanks for reading my long article so patiently !
OMG ! I simply found a error within your site! Check if your plugins is configured correctly!
The next time I read a blog, I hope that it doesn’t disappoint me as much as this one. I mean, I know it was my choice to read, but I actually thought you have something interesting to say.
Hi,
I use the google script for remove the duplicate row but I make some change.
1. I created a Menu
2- You can run the Script from the MENU, the same appear in the sheet.
3. This script delete only the row if the 1 & 2 column are the same. For example:
You have a sheet with 5 column and in the first and the second column have the same data but the the other column not. And you want to delete that rows.
Go To > Menu > Remover Duplicados >. Done THe script will run and delete the data for the dat row,
function onOpen() {
// get active spreadsheet
var ss = SpreadsheetApp.getActiveSpreadsheet();
// create menu
var menu = [{name: “Remover Duplicados”, functionName: “removeDuplicates”}];
// add to menu
ss.addMenu(“Menu”, menu);
// execute function
removeDuplicates();
}
function removeDuplicates() {
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
var newData = new Array();
for(i in data){
var row = data[i];
var duplicate = false;
for(j in newData){
if(row[0] == newData[j][0] && row[1] == newData[j][1]){
duplicate = true;
}
}
if(!duplicate){
newData.push(row);
}
}
sheet.clearContents();
sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}
Thanks your comment 🙂
I have to give up google docs , because I have more and more data now
In order to solve any issue . have to use PHP + MYSQL , I think that is better ,
Too many data , the google docs become very slow .
Thanks,.. but PHO + MySQL is supported in google Docs? I have around 8 sheet with 200 rows and 10 column each one… I modify every one every day… for now I don’t have issue.. The problem with google docs is when you have an script that take more than 5 sec… or when you have a lot of funtion OnOpen or On Edit…
I appreciate if you have give more data about How to start whit that and how to create a DB with my sheen and can be update everytime during 24hrs..
Hi , I mean I have to forget Google Sheet , using PHP + MYSQL now …
200 rows + 10 column each one is small data , I had 10000+ row for 20 column , ( 1 column I hope to make it none repeat ) , besides this , there is another relate table ( 4 column with 20000+ rows ) link with this main table . each time refresh need hours … so , 5 sec., seems a little only . You can stay with google docs I think .
PHP is a web build language , MYSQL is a database run on web server , it can handle it well .
At present , I have 46,292 row for main table , 84,009 row for 1 relate table , almost instant refresh after I upload data . ( For course remove lots of functions like format supporting ….)
I think if your project data is small , but growing , you have to consider learning PHP + MYSQL and start to write codes to upgrade your project .
Hi Winsion Liang,
Thanks for sharing this detailed info. I like your website very much.
Check out this script that I have coded to check duplicate entries in any range in Google Docs:
http://igoogledrive.blogspot.com/2013/07/how-to-check-duplicates-in-any-range.html
and for entire sheet:
http://igoogledrive.blogspot.com/2012/10/google-spreadsheet-script-to-check.html
Thanks,
Kishan.
Nice to know you 🙂
You code is good , I appreciated . Thanks
Winsion Liang:
Thank you for your detailed blog post. Through your trial and error, I found some methods to help me with my sheets. I look forward to seeing more of your posts.
Thanks,
Claire