Remove duplicates in Excel

#1
Hi all,

Just want to clarify if it is possible to remove duplicates in Excel using the Excel VBO?
If so, what is the approach I can take? Thanks.
 

VJR

Well-Known Member
Staff member
#6
Hi junwei,

Below is how the duplicates can be removed from Excel.

Few key things to note are-
The object is designed is such a way that:
- Column numbers that you want the duplicates to be removed from have to be passed as a string separated by commas (eg; "1, 3, 5"). This is the inputColumns parameter shown in the below screenshots.
This is because the RemoveDuplicates method takes column index numbers as a parameter to remove the duplicates from.
If you have Column alphabets within your process then you need to convert them to a comma separated string of column numbers and then pass it to this action.
This can be done either by using a code similar to this or by inserting a formula using SetCellValue in excel as shown below to get the column number of Column AL, for example, then forming a comma separated string and then clear the formula from the cell.
=COLUMN(INDIRECT("AL1"))

- You would need to modify the code to accept ALL columns as input to remove the duplicates from ALL columns if that is your case.

- Ignore the wordings 'Interop' in any of the screenshots as that was for a different purpose.

- Reference of RemoveDuplicates method
https://docs.microsoft.com/en-us/office/vba/api/excel.range.removeduplicates

- The code works fine under general conditions and exception handling and validation or any other modification needs to be done wherever necessary as per your own requirement.

--------Object Action starts here-------

- Create a custom action tab for the Remove Duplicates action in the Excel VBO, which is generally done by duplicating any of the other Excel tabs/actions. Note that this will add a new tab to the in-built Excel VBO. If you want to keep the original VBO provided by Blue Prism as it is, then you need to make a duplicate of the entire Excel VBO and then start creating new custom functions like this one over there.

- Object Diagram look as below:
View attachment 1544077720869.png

- Start stage Input parameters:
View attachment 1544077766940.png

- Code stage Input parameters:
View attachment 1544077802659.png

- Code stage Output parameters:
View attachment 1544077831637.png

- Code in the Code stage:
Code:
Dim sw, dw As Object
Dim ss, ds As Object
Dim excel, sheet As Object
Dim i as integer
Dim str() As String = inputColumns.Split(",")
Dim result(str.Length - 1) As Object 'Integer


Try

sw = GetWorkbook(Handle, Source_Workbook)
ss = GetWorksheet(Handle, Source_Workbook, Source_Worksheet)

sw.Activate()
ss.Activate()   'Whichever sheet name was passed as a parameter to this code stage that becomes the active sheet
excel = ss.Application
sheet = excel.ActiveSheet

'create an array of the column numbers based on the inputColumns parameter so as to pass it to the RemoveDuplicates method
For i = 0 To str.Length - 1
    result(i) = str(i)
Next

sheet.Range(cellRef).RemoveDuplicates(Columns:=(result), Header:=HeaderYesNoGuess)

'Numeric constant values for Header parameter
'https://docs.microsoft.com/en-us/dotnet/api/microsoft.office.interop.excel.xlyesnoguess?view=excel-pia

'Name      | Value    | Description
'xlGuess  | 0        | Excel determines whether there is a header, and where it is, if there is one.
'xlNo      | 2        | Default. The entire range should be sorted.
'xlYes      | 1        | The entire range should not be sorted.


Success = True

Catch e As Exception
    Success = False
    Message = e.Message
Finally
    sw = Nothing
    ss = Nothing
    dw = Nothing
    ds = Nothing
    excel = Nothing
    sheet = Nothing   
    
End Try

- The Excel sheet and the Process Diagram before running the new action:
Notice the duplicates in Columns A and B that are marked with the same color.
View attachment 1544078192260.png


- Parameters passed from the new action:
Information on inputColumns is given in the top section of this post and HeaderYesNoGuess is explained within comments in the Code.
View attachment 1544078437685.png

- Duplicates based on Column A and B removed after running the new action:
View attachment 1544078294159.png
 
Top