I have a column A has lots of different number , I need to find out which one is duplicate , highlight it then delete it according my choose , here is some methods I am testing :
Method A : Check if a cell appear in last all cells above .
1. Add a B column and place this formula in B2 : =countif(A$1:A1;A2)>0 , Copy the formula for all B cells that are adjacent to A cells containing data. A TRUE value in the B column tells you that the value in the A column is a duplicate.
That means : if A10 found A1 – A9 has duplicate , it count + 1 , if the value > 0 , the system tell you True , Or False .
2. Use condition format , if TRUE , highlight the background . You can see some color are there , those are duplicate items in Column A.
3. You have to use Ctrl+F , to copy and find where is the duplicate item appears , then delete it .
My comments :
This is a good method to solve small quantity data , but I have 10 thousands line to check , and each line has to check above again . you can imagine , in Line 9999 , he must check countif() in 1 – 9998 , it might be slow because there are too much repeat calculation to evaluate if it is repeat .
So I think this is a slow method to solve big quantity column data .
Method B: Sort the column from A – Z , then check last cell’s if duplicate .
1. Sort column A from A – Z,
2. In column B2 , input : =IF(A2=A1, “Duplicate”, “”)
3. All duplicate cell in Column A , display text “Duplicate” in Column B , and they list side by side .
4. Find duplicate easier then delete according your choose .
My comments :
This is a good method to solve big quantity data duplicate issue , because each data just compare the one above it , but another issue is : It might be long time to sort column A from A to Z , especially when you are sharing this sheet to other partners who is inputting data ! ( Unlucky that I am in this situation .)
Of course I know the UNIQUE formula to solve automatically , but it won’t let me choose delete which row . See below Gif :
Click an empty cell, type =UNIQUE(, select the range of cells you want to filter and thenclose the parenthesis. Simple.
Step 1: Assuming that our original data is in columns A, B & C, go to cell D1 and write a formula to concatenate the data in three columns. We’ll use a pipe separator to distinguish between a row like “a1, b, c” and “a, 1b, c”.
=CONCATENATE(A1, “|”, B1, “|”, C1) – drag the cell handle to fill the formula in other cells.
Step 2: Sort the D column by clicking the header as shown in the screencast.
Step 3: We now need a function to compare two adjacent values in column D. If the values are same, one of them is a duplicate for sure. Type this in E2:
=IF(D2=D1, “Duplicate”, “”) – drag to fill all cells until E5.
That’s it. All rows in column E that have value “Duplicate” are duplicate rows.
My comments :
Can not choose which one is good to delete , not what I want .
Finally , A fantasy Script comes !!!!
Very easy , just in Column B1, write down : =arrayformula(countif(if(A:A<>””,A:A),A:A)>1) ———– note : be ware of space in the front and bottom , it might cause not work .
Then use conditional formatting , if True highlight as yellow color , see attachment ,
Now you can consider sort column A or not .
My comments :
Very easy to use , fast , accurate , optional of future control , no need to sort it before finding , able to sort after checking , you can choose if need delete . Because data list beside .
Another thing is , once you wrote this formula and highlight formatting , it is 100% automatically check, whether you insert further data inside or not .
More !!!
Another guy write a scripts here to solve , but it seems hard to understand and not easy to manual delete .
https://developers.google.com/apps-script/articles/removing_duplicates
Thanks for reading my long article so patiently !