Solved Searching values from big Collections or Excel Worksheets

Status
Not open for further replies.
Hello,
Im trying to create an automation that reads each element from one column on a file, and search it in another file in another column if that value exists. The idea is to find the elements from one column that are not in the other column.
The problem im having is that this files are huge, around 6000 rows on each. And the process i created to do this is taking over 15 mins to do it.
First i tried reading each value from one file then search that value on the the other file but that took too long. Then i used collections, so i transfer one file to a collection and then i read each element on the Excel file and use the function from the BPA Collection Manipulation "Collection Contains Value" and the process is a lot faster, but its still too long to use.
Do you guys have any suggestions?
As i said, im trying to find the rows in file A that are not in file B. File = Excel.
Thanks.
 

RDawson9

Geek
Staff member
Hi @arodriguezm

Your second solution sounds like the best way to go. Read both files into collections and then perform the comparison in Blue Prism. Maybe create a third collection that can store the values which don't exist.

If you want to post any screenshots of the process you have so far, that might help us spot any potential areas for improvement.

Rob :)
 
ilv.png

Hello! This is the process im working on, the important thing here is that I have two files, i get the file 2 as a collection and i read each row in file 1 using a loop to check if that element exist in the collection made of file 2. This process works, but it takes too much time because the file is very big. This can be done in excel in no time using some native excel functions, so i dont know if i can call those functions from BP or something.
Thanks for your help.
 

RDawson9

Geek
Staff member
Hi @arodriguezm

Rather than reading the values from Excel one by one ("Get Cell Value User from file 1") and then checking them against Coll2, could you not get Coll1 as a collection and then use a Loop Stage to iterate through Coll1 and check to see if the value exists in Coll2?

This should definitely speed up the overall work time.

Rob
 
Hi Rob, thanks for your suggestion, and yes, i did just what you said yesterday, the time overall was improved, but it is yet too much time. The problem im seeing here, is that the Excel has way too much rows, i tried using a simple counter iterating though all the collection created from the original excel file and that simple process took basically the same time. So if there is a way to do this comparison like in excel, im thinking, using the Excel functions in BP and write in a column a formula for example vlookup, do you know if that may work?
 

anisjolly

Administrator
Hi @arodriguezm Have you tried creating a Macro in Excel and use Blue Prism to trigger the Macro? Have a look at the following thread as there may be some information you could use to develop a Macro related solution: http://www.rpaforum.net/threads/macro-help.518/#post-821

You can use Blue Prism to work alongside solutions outside of the Blue Prism ecosystem - sometimes that can work better, however this means that your process then has a dependency outside of BP's unique ecosystem - something you need to ensure that has a good lifespan and also the skillset to manage and maintain the solution outside of BP.
 

tgundhus

Member
Hello.

I guess a macro or a vlookup formula in excel is the easiest. But the fastest would be to create an object with that joins two datatables.
Sad to say, but both excel and the looping function in BP kind of struggle with it comes to performance. So if performance is a factor, I suggest you create your own object to handle huge data masses.
 
Status
Not open for further replies.
Top