How to remove duplicates from a excel column

AmyT

New Member
I have an excel spreadsheet with a number of columns,(userid, timestamp, ip address etc). I would like to extract only the unique userids to a collection. What is the best way to do so?
 

VJR

Well-Known Member
Hi AmyT,

Since you are already aware of the other thread for the similar topic I will use the same to provide details to you.

Option #1:
http://www.rpaforum.net/threads/how-to-get-unique-values-from-a-collection.757/post-1872

You can use the same logic mentioned in above link to compare the "key" column of your Excel sheet (ie; collection).
If say you do not have a key column and you know that the rows are duplicate only when every single
column value of one row is same as every column value of the other row, then you will have to compare each of the column values on that row. Then copy to the output collection only if values of all the columns are matching.
This option could be time taking particularly if there are large number of rows because we have to check one row at a time, but if you want to use the same logic that you have already done then you could try and see how "lengthy" it goes.

Option #2:
- Place an excel file with a macro in it which removes duplicates from the sheet data. This macro is a short one and going to be of 3 lines of code including the function start and end.
- Copy the Original collection to an Excel in one shot
- Run the macro on that sheet using the 'Run macro' action of Blue Prism
- The resultant data will have all the dupes removed
- Get the sheet into an Output collection in one shot
- While closing you can ignore saving the macro file so that each time the sheet is blank
Under general conditions errors do not arise when the macro runs, but it gets cumbersome to handle if they do because you are calling the macro from BP.

Option #3:
Write a new Action page in the exiting MS Excel VBO that will remove duplicates from a given data on a sheet.
This will involve writing specific code in the Code stage for this new Action page.
Error handling will have to be done so that this object as well as the calling process/object does not fail.
Once the Action is ready it can be called and reused from any other Process or Object.
 
Last edited:
Top