Solved Extract Data From Excel File

rpaccount

Member
I use Excel VBO to create excel instance > open that particular workbook and show it. Now, i need to open folders that consist of multiple workbooks and read in the data inside the workbooks to consolidate them.

Example of such situation
Folder 1 - File name : 20180101
| Mails | Count |
------------------
| A1 | 0 |
| B1 | 1 |
| C1 | 2 |
| D1 | 0 |

Folder 2 - File name : 20180102
| Mails | Count |
------------------
| A1 | 0 |
| E1 | 0 |
| F1 | 2 |
| G1 | 0 |

Expected results: My new excel file with:
| Mails | Num of times 0 has appeared |
---------------------------------------
| A1 | 2 |
| B1 | 0 |
| C1 | 0 |
| D1 | 1 |
| E1 | 0 |
| F1 | 1 |
| G1 | 0 |

PS: It’s a confidential thing so i can’t really screenshot the process for you to see. What i can do is to try my best to describe it.

Summary:
  1. Read multiple workbooks in different folders (file name: dates) — COMPLETED (use collections and loop to do so)
Now:
  1. Extract count from each mails (E.g. A1,B1,C1...)
  2. Put them in new file and show number of times 0 has appeared.
 

VJR

Well-Known Member
Hi rpaccount,

If I understood it correctly then it is,
| E1 | 1
| F1 | 0
| G1 | 1

You can achieve this by doing the below using minimal OR absolutely no looping involved.
i. If you haven't already done, get all the data of all the files into 1 single collection appending one after the other.

ii. Dump this collection to your new Excel file using a single action available.
Now your new Excel file looks like this-
1528518991017.png

Instead of steps i. and ii. you can also use 'Copy and Paste Worksheet Range' action to append directly from each of those Excels to your Destination File.

iii. Add a column Header to the C column in the new file.
Insert a formula from C2 to its LastRow as =COUNTIFS($B$2:$B$9,0,$A$2:$A$9,$A2)
This involves a Code stage and you can do this as shown in below link-
http://www.rpaforum.net/threads/manipulating-large-volume-of-excel-data-in-blue-prism.944/post-2699
For calculating LastRow there is a ready-made function available in Ms Excel VBO called as 'Get Number Of Rows'.

Now your new Excel file looks like the below
1528519457419.png

iv. This now shows all the correct counts and your job now is to only delete the duplicates of Column A along with its entire row.

v. Deleting duplicates can be done in two ways.
Option 1. Remove duplicates directly from Excel.
- You cannot directly delete rows in the current situation because if the rows are deleted the Formulas will show incorrect results.
- So do Copy and Paste of only values from Column C to Column D

1528519609744.png

- Delete Column C which has the Formulas.

- Column D will now shift to become your Column C
1528519808284.png

Then write a code stage to call the Remove Duplicates function of Excel for Column A.

After removing duplicates the data looks as below
1528519897591.png

-This is your final expected result.
-No looping is involved.

Option 2. Remove duplicates from the Collection.
- Get the excel data from Step iv. into a collection
- Remove the duplicates from Column A of the collection.
If needed you can take a look at the below post-
http://www.rpaforum.net/threads/how-to-get-unique-values-from-a-collection.757/post-1872

-The basic idea is to loop through the above Collection (lets say SourceCollection). Have another new empty collection in your diagram as DestnCollection.
-Read Column A of SourceCollection. Check if it is present in DestnCollection (single action shown in above link).
If present, do nothing. If not present, copy that row to DestnCollection.

When the loop is over the DestnCollection is the one which is your expected requirement.
Looping is involved.

If you choose any of these approaches let me know if you are facing any issues.
 

rpaccount

Member
Hi VJ,

Thank you for your answer!
I have some questions:

Adding 0 appeared column
  • Add column header (0 appeared), then get last row of count (using get number of rows) and save it as a data item
Question: After that...
  • I’m not sure how to implement the COUNTIFS. Can I just use Set Cell Value like your previous example from here? : http://www.rpaforum.net/threads/man...-data-in-blue-prism.944/#lg=post-2699&slide=1
  • Cell reference “B2” but idk what to put for value.
  • =COUNTIFS($B$2:$B$9,0,$A$2:$A$9,$A2) --- Is this formula here counting the number of times 0 appear? Seems like we need a =0 or something right? My Excel is quite rusty.
Get unique values from column

http://www.rpaforum.net/threads/how-to-get-unique-values-from-a-collection.757/#lg=post-1872&slide=3
  • From this link, i know how to get the unique column, but how about the entire row?
  • Because the example you gave only shows the unique column, not the row of the unique column.
 

VJR

Well-Known Member
Hi rpacount,

Can I just use Set Cell Value like your previous example from here
You can skip the Set Cell Value since the formula you will be entering in the code stage will be starting from C2.

This is the line of Code.
sheet.Range("C2:C" & LastRow) = "=COUNTIFS($B$2:$B$" & LastRow & ",0,$A$2:$A$" & LastRow & ",$A2)"

Notice the bold comma ',' in the above formula. There are 4 parameters and what those parameters are, will be shown if you type =COUNTIFS in excel or by selecting this function from the Name box in the top left corner

1528458771036.png

Those 4 parameters are as explained in the above 4 boxes.
Assuming your LastRow is 10, this formula goes into B2 to B10 and checks for 0 ($B$2:$B$" & LastRow & ",0)
Second criteria is within A2 to A10 check for A2 because you need to count A2 ($A$2:$A$" & LastRow & ",$A2)


Is this formula here counting the number of times 0 appear?
Yes, it is and that is why immediately after entering the formulas the correct counts are displayed in Column C.

From this link, i know how to get the unique column, but how about the entire row?
The links are just references to get you started and they won't contain full exact details for what you are looking since it directly answers the question to that post.

Using 'Contains Value' you are only going to check Column A for the duplicates.
If not found in the output (destination) collection then Write to output collection.
'Write to output collection' uses only 1 column in the above link inthe Calc stage. Likewise for all columns on that row you could use a Multi Calc stage, thus copying the entire row. Here you are checking only one column for duplicates but copying the entire row since every row already has the correct counts in Column C. In the end Delete Column B from collection using 'Delete Column' action.
 

rpaccount

Member
Hi VJ,

Once again thank you for your answer!
What if i want to avoid code stage? Is there a way to do it using the process studio only? With action stages and etc.
 

VJR

Well-Known Member
Hi VJ,

Once again thank you for your answer!
What if i want to avoid code stage? Is there a way to do it using the process studio only? With action stages and etc.
Hi rpaccount,

Not sure if you fully read my above post.
I know some folks are still learning the Code stage and tend to avoid it :).
And that is why I have given you two options in Step v.

v. Deleting duplicates can be done in two ways.
Option 1. Remove duplicates directly from Excel.
uses Code stage
Option 2. Remove duplicates from the Collection does not use Code stage.

The choice is yours among the above.
 

VJR

Well-Known Member
Hi rpaccount,

I understand that you may have submitted a few posts and then deleted them. Are you still looking for a solution. Were you able to get the counts without the code and with looping through the collection?
 
Top