Manipulating large volume of Excel data in Blue Prism

#1
Hi All,

I came across this process where there are more than 250,000 excel records, which should be manipulated through vlookups and formulas. As I see this will be a time consuming process. Any advice on what's the best possible way to automate this process using Blue Prism?
 

VJR

Well-Known Member
Staff member
#2
Hi Hasipri,

There are multiple options to do this, but to provide you with a suitable solution can you provide furthermore details of what kind of manipulation is required using vlookups and formula on these 250,000 records in excel.

Below are some sample questions and you can add more details as possible-
An Excel sheet can accommodate 10,48,576 records.
- So how are these (more than) 2,50,000 records arranged on this file?
- How many separate sheets and how many records per sheet.
- How many columns have the vlookups
- How many columns are going to have the other formulas
- Does the file already contain some macro code functionality
- And what exactly do you need to do on these records
 
#3
Hi All,

I came across this process where there are more than 250,000 excel records, which should be manipulated through vlookups and formulas. As I see this will be a time consuming process. Any advice on what's the best possible way to automate this process using Blue Prism?

I would suggest the easiest method would be to create a macro to do this task i would not even use Blue Prism.
 
#4
Hi Hasipri,

There are multiple options to do this, but to provide you with a suitable solution can you provide furthermore details of what kind of manipulation is required using vlookups and formula on these 250,000 records in excel.

Below are some sample questions and you can add more details as possible-
An Excel sheet can accommodate 10,48,576 records.
- So how are these (more than) 2,50,000 records arranged on this file?
- How many separate sheets and how many records per sheet.
- How many columns have the vlookups
- How many columns are going to have the other formulas
- Does the file already contain some macro code functionality
- And what exactly do you need to do on these records
So how are these (more than) 2,50,000 records arranged on this file?
In a tabular format in excel
How many separate sheets and how many records per sheet?
One main sheet with 250,000 records. This main Workbook refer different workbooks for vlookups.
How many columns have the vlookups ?
Roughly about 10. But these vlookups are carried outside the main sheet in other workbooks and only result of vlookup is copied back to the main sheet.
How many columns are going to have the other formulas?
Roughly about 7-10.
Does the file already contain some macro code functionality?
No, Macros are not used due to security issues.

If we automate the vlookups through blueprism what is the best possible way? Do we have to copy the vloopup formulae to each and every cell? Is there any other way?

Since there are a large number of records, is using excel and Blue Prism the best way to tackle this? Is it better to use a sql and Blue Prism based solution. Thanks in advance.
 

VJR

Well-Known Member
Staff member
#5
Hi Hasipri,

"If we automate the vlookups through blueprism what is the best possible way? Do we have to copy the vloopup formulae to each and every cell? Is there any other way?"
There are two ways to do this and they are explained as below-

Approach #1: Using predefined actions provided by Blue Prism:
- Refer InputFiles.jpg. Refer their names in the title bar.
Your scenario of the files may or may not be the same as explained below, but you can get an idea of what can be done.
First file is your Main file where only Name column is filled initially and after running the process the Age and the City columns are populated by looking up from the external xlsx file (the second one).

- Refer 1.jpg: This is how the Process diagram looks when the predefined actions of BP are used
- At first you are simply setting the vlookup formula only in the first cell of Age column (Screenshot 2.jpg)
- And then you are simply pasting this formula till the last row of that column (3.jpg)
- There is an action available to get the number of rows in a sheet. But note that it will give the value of the column that has the maximum number of rows so assuming you have such a primary column in your sheet (eg Sr.No or Name in this case).
- You do the same for the City column.


Approach #2: Using a Code stage
- Screenshot 4.jpg is the process diagram for this approach
- This involves creating a new object and using the code stage. I created a copy of an existing action available in the MS Excel VBO, made the necessary removals and added a new action page. This new page looks as 5.jpg
- I am passing the last row from the calling Process to this action and using it in the code as below

Rich (BB code):
Dim sw, dw As Object
Dim ss, ds As Object
Dim excel, sheet As Object


Try

sw = GetWorkbook(Handle, Source_Workbook)
ss = GetWorksheet(Handle, Source_Workbook, Source_Worksheet)


sw.Activate()
ss.Activate()
excel = ss.Application
sheet = excel.ActiveSheet


sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Success = True

Catch e As Exception
    Success = False
    Message = e.Message
Finally
    sw = Nothing
    ss = Nothing
    dw = Nothing
    ds = Nothing
    excel = Nothing
    sheet = Nothing   
End Try

You simply need to add the formula for your required columns in the below lines:
Code:
sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Comparison:
Approach #1 did not use any action other than what was already available in BP.
So if you do not have much awareness of Visual Basic coding or macros you can use this.
It took the process 2 seconds and 78 milliseconds for 1000 rows of data for 2 columns (Age and city)

Approach #2 uses the Visual Basic coding semantics which is the language used by the other actions present in MS Excel VBO.
It took 2 seconds and 70 milliseconds for 1000 rows of data for 2 columns (Age and city)

These were run from within the diagram (and not from Control Room) with Debug Speed set to the Fastest.
I am using a Virtual Machine so things would be faster if you are using a local machine.

Approach #3: Using a connection string
You can connect to Excel using a Connection String and access the data on the sheet using SQL queries. I won't recommend this given the complexities that may involve while applying vlookups and formulas

Approach #4: Using SQL and Blue Prism approach - the one that you are thinking of
This will involve designing the database according to the way you need it. Then use SQL queries on that data.
The actions to connect to a database and fire queries are very much available in Blue Prism, but I leave this approach for you to explore. While designing approaches #1 and #2 I didn't see a reason to discard excel and start using sql, unless you will be finding the excel file with the existing data of 2,50,000 records to be very heavy and slow in response.


Approach #5: Macros
This was one of the approaches but cannot be utilized due to security reasons so I won't explain how that can be used here.


Based on the above shown approaches you will need to gauge how complex or simple your sheet is, how the data is arranged and if the formulas can be applied in a straightforward way as shown above and decide upon the best suitable option.
 

Attachments

#6
Hi Hasipri,

"If we automate the vlookups through blueprism what is the best possible way? Do we have to copy the vloopup formulae to each and every cell? Is there any other way?"
There are two ways to do this and they are explained as below-

Approach #1: Using predefined actions provided by Blue Prism:
- Refer InputFiles.jpg. Refer their names in the title bar.
Your scenario of the files may or may not be the same as explained below, but you can get an idea of what can be done.
First file is your Main file where only Name column is filled initially and after running the process the Age and the City columns are populated by looking up from the external xlsx file (the second one).

- Refer 1.jpg: This is how the Process diagram looks when the predefined actions of BP are used
- At first you are simply setting the vlookup formula only in the first cell of Age column (Screenshot 2.jpg)
- And then you are simply pasting this formula till the last row of that column (3.jpg)
- There is an action available to get the number of rows in a sheet. But note that it will give the value of the column that has the maximum number of rows so assuming you have such a primary column in your sheet (eg Sr.No or Name in this case).
- You do the same for the City column.


Approach #2: Using a Code stage
- Screenshot 4.jpg is the process diagram for this approach
- This involves creating a new object and using the code stage. I created a copy of an existing action available in the MS Excel VBO, made the necessary removals and added a new action page. This new page looks as 5.jpg
- I am passing the last row from the calling Process to this action and using it in the code as below

Rich (BB code):
Dim sw, dw As Object
Dim ss, ds As Object
Dim excel, sheet As Object


Try

sw = GetWorkbook(Handle, Source_Workbook)
ss = GetWorksheet(Handle, Source_Workbook, Source_Worksheet)


sw.Activate()
ss.Activate()
excel = ss.Application
sheet = excel.ActiveSheet


sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Success = True

Catch e As Exception
    Success = False
    Message = e.Message
Finally
    sw = Nothing
    ss = Nothing
    dw = Nothing
    ds = Nothing
    excel = Nothing
    sheet = Nothing 
End Try

You simply need to add the formula for your required columns in the below lines:
Code:
sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Comparison:
Approach #1 did not use any action other than what was already available in BP.
So if you do not have much awareness of Visual Basic coding or macros you can use this.
It took the process 2 seconds and 78 milliseconds for 1000 rows of data for 2 columns (Age and city)

Approach #2 uses the Visual Basic coding semantics which is the language used by the other actions present in MS Excel VBO.
It took 2 seconds and 70 milliseconds for 1000 rows of data for 2 columns (Age and city)

These were run from within the diagram (and not from Control Room) with Debug Speed set to the Fastest.
I am using a Virtual Machine so things would be faster if you are using a local machine.

Approach #3: Using a connection string
You can connect to Excel using a Connection String and access the data on the sheet using SQL queries. I won't recommend this given the complexities that may involve while applying vlookups and formulas

Approach #4: Using SQL and Blue Prism approach - the one that you are thinking of
This will involve designing the database according to the way you need it. Then use SQL queries on that data.
The actions to connect to a database and fire queries are very much available in Blue Prism, but I leave this approach for you to explore. While designing approaches #1 and #2 I didn't see a reason to discard excel and start using sql, unless you will be finding the excel file with the existing data of 2,50,000 records to be very heavy and slow in response.


Approach #5: Macros
This was one of the approaches but cannot be utilized due to security reasons so I won't explain how that can be used here.


Based on the above shown approaches you will need to gauge how complex or simple your sheet is, how the data is arranged and if the formulas can be applied in a straightforward way as shown above and decide upon the best suitable option.
Hi VJR
Thank you so much for your response. This pretty much cleared all my doubts. :D
 
#7
I am trying to use the 1st approach to perform vlookup between two workbooks,but when the set cell stage comes it is opening a popup to browse for the file. I am not understanding the need of browsing the file, when I am already passing it in the value field in the set cell value.Please find attached the screenshot for the set cell value stage.
 

Attachments

  • 27.4 KB Views: 59

VJR

Well-Known Member
Staff member
#8
I am trying to use the 1st approach to perform vlookup between two workbooks,but when the set cell stage comes it is opening a popup to browse for the file. I am not understanding the need of browsing the file, when I am already passing it in the value field in the set cell value.Please find attached the screenshot for the set cell value stage.
Hi pdbinni, I haven't taken a deeper look but it looks like only the name of the xlsx file is given in the Value and not its path so Excel is not able to understand where to pick that file from and hence showing the File Open dialog. Can you check if that is the case. Post back if you are having any issues.
 
#9
Hi Hasipri,

"If we automate the vlookups through blueprism what is the best possible way? Do we have to copy the vloopup formulae to each and every cell? Is there any other way?"
There are two ways to do this and they are explained as below-

Approach #1: Using predefined actions provided by Blue Prism:
- Refer InputFiles.jpg. Refer their names in the title bar.
Your scenario of the files may or may not be the same as explained below, but you can get an idea of what can be done.
First file is your Main file where only Name column is filled initially and after running the process the Age and the City columns are populated by looking up from the external xlsx file (the second one).

- Refer 1.jpg: This is how the Process diagram looks when the predefined actions of BP are used
- At first you are simply setting the vlookup formula only in the first cell of Age column (Screenshot 2.jpg)
- And then you are simply pasting this formula till the last row of that column (3.jpg)
- There is an action available to get the number of rows in a sheet. But note that it will give the value of the column that has the maximum number of rows so assuming you have such a primary column in your sheet (eg Sr.No or Name in this case).
- You do the same for the City column.


Approach #2: Using a Code stage
- Screenshot 4.jpg is the process diagram for this approach
- This involves creating a new object and using the code stage. I created a copy of an existing action available in the MS Excel VBO, made the necessary removals and added a new action page. This new page looks as 5.jpg
- I am passing the last row from the calling Process to this action and using it in the code as below

Rich (BB code):
Dim sw, dw As Object
Dim ss, ds As Object
Dim excel, sheet As Object


Try

sw = GetWorkbook(Handle, Source_Workbook)
ss = GetWorksheet(Handle, Source_Workbook, Source_Worksheet)


sw.Activate()
ss.Activate()
excel = ss.Application
sheet = excel.ActiveSheet


sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Success = True

Catch e As Exception
    Success = False
    Message = e.Message
Finally
    sw = Nothing
    ss = Nothing
    dw = Nothing
    ds = Nothing
    excel = Nothing
    sheet = Nothing  
End Try

You simply need to add the formula for your required columns in the below lines:
Code:
sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Comparison:
Approach #1 did not use any action other than what was already available in BP.
So if you do not have much awareness of Visual Basic coding or macros you can use this.
It took the process 2 seconds and 78 milliseconds for 1000 rows of data for 2 columns (Age and city)

Approach #2 uses the Visual Basic coding semantics which is the language used by the other actions present in MS Excel VBO.
It took 2 seconds and 70 milliseconds for 1000 rows of data for 2 columns (Age and city)

These were run from within the diagram (and not from Control Room) with Debug Speed set to the Fastest.
I am using a Virtual Machine so things would be faster if you are using a local machine.

Approach #3: Using a connection string
You can connect to Excel using a Connection String and access the data on the sheet using SQL queries. I won't recommend this given the complexities that may involve while applying vlookups and formulas

Approach #4: Using SQL and Blue Prism approach - the one that you are thinking of
This will involve designing the database according to the way you need it. Then use SQL queries on that data.
The actions to connect to a database and fire queries are very much available in Blue Prism, but I leave this approach for you to explore. While designing approaches #1 and #2 I didn't see a reason to discard excel and start using sql, unless you will be finding the excel file with the existing data of 2,50,000 records to be very heavy and slow in response.


Approach #5: Macros
This was one of the approaches but cannot be utilized due to security reasons so I won't explain how that can be used here.


Based on the above shown approaches you will need to gauge how complex or simple your sheet is, how the data is arranged and if the formulas can be applied in a straightforward way as shown above and decide upon the best suitable option.
Hi ,
In the fourth attached image there is one action stage called "Drag formula for all columns" can you explain me what is there inside this action is this default function available within BP?
 

VJR

Well-Known Member
Staff member
#10
Hi ,
In the fourth attached image there is one action stage called "Drag formula for all columns" can you explain me what is there inside this action is this default function available within BP?
Hi syed,

I have mentioned about that action in "Approach #2: Using a Code stage".
 
#11
Hi Hasipri,

"If we automate the vlookups through blueprism what is the best possible way? Do we have to copy the vloopup formulae to each and every cell? Is there any other way?"
There are two ways to do this and they are explained as below-

Approach #1: Using predefined actions provided by Blue Prism:
- Refer InputFiles.jpg. Refer their names in the title bar.
Your scenario of the files may or may not be the same as explained below, but you can get an idea of what can be done.
First file is your Main file where only Name column is filled initially and after running the process the Age and the City columns are populated by looking up from the external xlsx file (the second one).

- Refer 1.jpg: This is how the Process diagram looks when the predefined actions of BP are used
- At first you are simply setting the vlookup formula only in the first cell of Age column (Screenshot 2.jpg)
- And then you are simply pasting this formula till the last row of that column (3.jpg)
- There is an action available to get the number of rows in a sheet. But note that it will give the value of the column that has the maximum number of rows so assuming you have such a primary column in your sheet (eg Sr.No or Name in this case).
- You do the same for the City column.


Approach #2: Using a Code stage
- Screenshot 4.jpg is the process diagram for this approach
- This involves creating a new object and using the code stage. I created a copy of an existing action available in the MS Excel VBO, made the necessary removals and added a new action page. This new page looks as 5.jpg
- I am passing the last row from the calling Process to this action and using it in the code as below

Rich (BB code):
Dim sw, dw As Object
Dim ss, ds As Object
Dim excel, sheet As Object


Try

sw = GetWorkbook(Handle, Source_Workbook)
ss = GetWorksheet(Handle, Source_Workbook, Source_Worksheet)


sw.Activate()
ss.Activate()
excel = ss.Application
sheet = excel.ActiveSheet


sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Success = True

Catch e As Exception
    Success = False
    Message = e.Message
Finally
    sw = Nothing
    ss = Nothing
    dw = Nothing
    ds = Nothing
    excel = Nothing
    sheet = Nothing  
End Try

You simply need to add the formula for your required columns in the below lines:
Code:
sheet.Range("B2:B" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,2,FALSE)"
sheet.Range("C2:C" & LastRow) = "=VLOOKUP(A2,'C:\Users\Administrator\Desktop\[ExternalWkbk.xlsx]Sheet1'!$A$2:$C$4,3,FALSE)"



Comparison:
Approach #1 did not use any action other than what was already available in BP.
So if you do not have much awareness of Visual Basic coding or macros you can use this.
It took the process 2 seconds and 78 milliseconds for 1000 rows of data for 2 columns (Age and city)

Approach #2 uses the Visual Basic coding semantics which is the language used by the other actions present in MS Excel VBO.
It took 2 seconds and 70 milliseconds for 1000 rows of data for 2 columns (Age and city)

These were run from within the diagram (and not from Control Room) with Debug Speed set to the Fastest.
I am using a Virtual Machine so things would be faster if you are using a local machine.

Approach #3: Using a connection string
You can connect to Excel using a Connection String and access the data on the sheet using SQL queries. I won't recommend this given the complexities that may involve while applying vlookups and formulas

Approach #4: Using SQL and Blue Prism approach - the one that you are thinking of
This will involve designing the database according to the way you need it. Then use SQL queries on that data.
The actions to connect to a database and fire queries are very much available in Blue Prism, but I leave this approach for you to explore. While designing approaches #1 and #2 I didn't see a reason to discard excel and start using sql, unless you will be finding the excel file with the existing data of 2,50,000 records to be very heavy and slow in response.


Approach #5: Macros
This was one of the approaches but cannot be utilized due to security reasons so I won't explain how that can be used here.


Based on the above shown approaches you will need to gauge how complex or simple your sheet is, how the data is arranged and if the formulas can be applied in a straightforward way as shown above and decide upon the best suitable option.
Hi,

I've done everything according to Approach #2 and your algorithm works. However, for vlookup i'm getting an exception with message: "0x800A03EC". I tried to use different formulas. For instance, I used SUM function and it worked well, but for vlookup i'm getting an error. Don't you know why this happening?

Additionally, I tried to use Approach #1 by using MS Excel VBO "Set Cell Value" function and here the same story. I'm getting the same exception with message: "0x800A03EC".
 

VJR

Well-Known Member
Staff member
#12
Hi,

I've done everything according to Approach #2 and your algorithm works. However, for vlookup i'm getting an exception with message: "0x800A03EC". I tried to use different formulas. For instance, I used SUM function and it worked well, but for vlookup i'm getting an error. Don't you know why this happening?

Additionally, I tried to use Approach #1 by using MS Excel VBO "Set Cell Value" function and here the same story. I'm getting the same exception with message: "0x800A03EC".
What is the complete error message?
 
#15
I am suspecting that the vlookup formula may not have been supplied correctly. Have you tried with a simple one like "=A1" considering approach 1 first?
I tried to use for simple formulas, for instance to insert to cell C2 value "=A2=B2" and it worked. Also, I tried "=Sheet1!A2", and it worked too. But for lookup it didn't. I'm assuming that these is because of the difference in number rows. But I don't know exactly the root of the problem
 
#16
I am suspecting that the vlookup formula may not have been supplied correctly. Have you tried with a simple one like "=A1" considering approach 1 first?
Actually, you were right. I've gone through my code and I noticed that my vlookup consisted of semicolons ";" because I just copied formula from excel and there it was written through semicolon. But in Code it is important to use comma ","
Anyway, thanks for your contribution!
 
Top