How to delete duplicate rows in collection

#1
Hi,

I am having a collection with 7 Columns and around 1000 rows. In that 1000 rows, some rows are duplicate.
How can I delete those duplicate rows from collection.

Thanks in advance.
 

VJR

Well-Known Member
Staff member
#2
Hi Naveen,

How does your collection look like?
i. Is there an unique field or a column to identify each row?
ii. When is a row considered as duplicate? Is it when the data of all the 7 columns are same? Or only when specific columns are same eg; Column1 & Column 2?
 
#3
@ VJR,
Thanks for your reply.
1) No unique field or a column to identify each row.
2) In my Collection, I need to check with all 7 columns data and delete if all column values are same.
PFA, for my collection sample data.

Thanks in Advance!!!
 

Attachments

#4
Hi Naveen,

Are you wanting to put the collection into a Work Queue after you have removed the duplicates?

There are a number of ways to do this but if I know what you wish to do with the collection data after, i'll be able to give you the best solution :)

Thanks,

Max
 

VJR

Well-Known Member
Staff member
#6
Hi Naveen,

There are multiple options to do this, even other than the below two options but most of them would be a tedious approach.

Option #1
I have responded to your other post here:
http://www.rpaforum.net/threads/how-to-get-unique-values-from-a-collection.757/post-1872

You can use the same logic to check each of the columns and copy to the output collection only if values of all the 7 columns are matching. This could be time taking particularly if there are large number of rows because we have to check one row at a time, but if you want to use the same logic then you could try and see how long it goes.

Option #2
- Place an excel file with a macro in it which removes duplicates from the sheet data. This macro is a short one and going to be of 3 lines of code including the function start and end. (I can provide you the code if needed)
- Copy the Original collection to an Excel in one shot
- Run the macro on that sheet using the 'Run macro' action of Blue Prism
- The resultant data will have all the dupes removed
- Get the sheet into an Output collection in one shot.
- You can ignore saving the sheet so that each time the file is blank

If you choose to start any of the above options let me know and I can help if you are facing any difficulties.
 
#8
@VJR

In my case too, i need to remove rows if all fields matches in a row. Could you provide a Macro code/any code to remove duplicate rows from collection and i need all fields again not only few fields. I may get any number of fields in input. Please help me with this

Thanks in Advance.
 
#10
Hi VJR,

Please could you provide the VB.NET code to remove duplicates from Excel?

Hi Naveen,

There are multiple options to do this, even other than the below two options but most of them would be a tedious approach.

Option #1
I have responded to your other post here:
http://www.rpaforum.net/threads/how-to-get-unique-values-from-a-collection.757/post-1872

You can use the same logic to check each of the columns and copy to the output collection only if values of all the 7 columns are matching. This could be time taking particularly if there are large number of rows because we have to check one row at a time, but if you want to use the same logic then you could try and see how long it goes.

Option #2
- Place an excel file with a macro in it which removes duplicates from the sheet data. This macro is a short one and going to be of 3 lines of code including the function start and end. (I can provide you the code if needed)
- Copy the Original collection to an Excel in one shot
- Run the macro on that sheet using the 'Run macro' action of Blue Prism
- The resultant data will have all the dupes removed
- Get the sheet into an Output collection in one shot.
- You can ignore saving the sheet so that each time the file is blank

If you choose to start any of the above options let me know and I can help if you are facing any difficulties.
 
#11
VJR has already provided 2 solutions, but if you want you can try also:
Loop through your collection - Orginal_Collection
use Utility - Collection Manipulation - Filter Collection
Collection in [Orginal_Collection]
Filter - example "Value ='"&[Orginal_Collection.Value]&"' AND Value2 ='"&[Orginal_Collection.Value2]&"'" etc.
output: FilteredCollection
then use Collection - Count Rows and count rows in Filtered Collection
if number of rows is greater than 1 then Collection Remove Row in Orginal Colection

the great DISADVANTAGE of this solution is that after removing row from collection loop is starting from start.
that means if you 999th row is duplicate then it will remove row 999th and will start from 1st row
This is why I recommend to use VJR's sollutions
 

Attachments

#13
Hi All,

when i had similar use case i used code stage to get distinct rows in a collection. Just try this one line c# code in code stage
OutColl = SourceColl.DefaultView.ToTable(true);

This is one way of doing it. Hope it helps :)

Thanks,
Rs
 
Top