Work Queues, Collections - Limitations - design question

Hi,

I have a requirement where I need to process about 400K records which are in an excel. Of the 400K records, there could be about 75K records (based on some eligibility criteria) for which, I need to go to a web portal and obtain the required information and update the details in the portal. For rest of the records which did not satisfy eligibility criteria, no updates are necessary. There are two ways I can accomplish this task.

1) Using Work Queues
Approach:
a) read entire 400K records into a collection and load them to a work queue
b) use multi bot architecture to process the records and update the details in excel.

2) Using Single bot:
Approach:
a) Read each record from the excel and identify if the record needs to be updated.
b) if yes, get the details from the web portal and update the details in the excel.
c) repeat above two steps until end of the the excel file.

I almost certainly need to develop the bot using first design approach (multi bot architecture) unless there is some thing that cannot be achieved using first approach. In this regard, below are the questions I have -

a) is there any limitation on the size of the collection I can use in Blue prism. would blue prism be able to read 400K records into a collection?
b) I know work queues can handle large amounts of data but would data size of 400k records be of any challenge for work queue to handle?
c) are there any pitfalls I need to look out for in this design?

Any suggestions/comments/feedback on the approaches mentioned above are welcome. If you have any other thoughts on any new approach, please let me know. Your inputs would be invaluable for me in designing the bot.
 
Got this useful solution from another forum. Sharing here for others to know.

Using the work queue should always be the right way, even for a single case; this allows you to know what cases were completed and which ones not. This way you won't have to process all the data again if something goes wrong during runtime.
I haven't had to read that many rows in a collection, but if the memory is an issue a possible solution could be:
  1. Open the workbook and get number of rows
  2. Call the "Get Worksheet Range As Collection" to read the data by chunks, for example row 1 to 10,000 - on the next loop get 10,001 to 20,000 and so on. This way your collection will never be larger than 10,000 rows.
  3. Create one Queue item per row in that collection
  4. Repeat steps 2 and 3 until you process the 400k records
Make sure to disable the logging, otherwise you might end up with a massive log file and potentially crash Blue Prism.
 

Tolani

New Member
I know this is an old thread but one thing I consider while developing is Time & Effort. In your case, only 75k may need processing on the portal and the rest needs no processing so it may not makes sense to load the 400k & maybe after 325k then your process starts hitting the relevant records. Also, If you only have to report the ones that went into the portal then you can also achieve this by designing your process in such a way that it filters based on the eligibility criteria in Excel using the filter option then reads the values of the result into the queue (75k) & process only the 75k. This will save your time and effort.
 
Top