Solved Using Extract Regex values action to get text after pattern

mdhazlee

New Member
Hi,

I've seen the following post regarding the use of regex to extract text patterns and store onto a collection (http://www.rpaforum.net/threads/how-to-use-the-extract-regex-values-action-of-utility-strings.681/), is there a way to expand on this to extract certain texts after a pattern?

An example would be as follows:

Firstname: Zhi
Middlename: N/A
Lastname: Tan
Gender: male
Title: Mr.
Birthday: 2/19/1972
Street: 10 Toh Guan Road #04-09 TT International Tradepark Singapore 608838. Singapore
City(town): Singapore
State(area): N/A

I would like to extract the Firstname (i.e. the text following "Firstname:") using regex action. I've tried and figured out how to extract the pattern using the following regex:

(?<=\Firstname: ).*

But I couldn't integrate it into the Blue Prism action to save onto collection. Any one who has done sth similar or know of a solution using Regex, pls let me know.

Attached are the screen captures for the regex syntax which I think should work, but apparently it doesn't.

Thanks!
 

Attachments

  • Test Regex object.PNG
    13.2 KB · Views: 208
  • Extract firstname.PNG
    7 KB · Views: 187

imstefano

New Member
Hello mdhazlee,

I've tried using the following regular expression:
Code:
(?<=\:)(?<Groups>.*)

where Groups is the name of the Text in the collection (you can use "Filename" in your case).

the only thing is that it leaves a space before the name, if it's as in the example, but it can be removed with a Trim in a Calc stage after the Regex.

Attached you'll find the flow chart, the regular expression and the trim stage.
 

Attachments

  • flow.PNG
    5.6 KB · Views: 231
  • regex.PNG
    18.3 KB · Views: 233
  • trim.PNG
    9.2 KB · Views: 200

mdhazlee

New Member
Hi @imstefano, thank you for the solution. It works as expected. From the syntax, I can further expound on your solution and refine it to be more specific such that I'm pointing to the match immediately after "Firstname: "

Code:
(?<=Firstname\:)(?<Firstname>.*)

I tried using 2 actions back to back containing different regex expressions to store different fields onto the same collection but it seems the first field that I've written (for e.g. the birthday) was overwritten when the 2nd field (Firstname) was successfully written using your regex expression. Is there a better way to combine these so that I could store different fields onto the same collection?
 

mdhazlee

New Member
@imstefano , correction. I've taken a second look at your solution and it is still not removing the texts after the first name. I've managed to come up with a solution to extract exactly the texts required.

Code:
(?<=Firstname\:\s)(?<Firstname>.*)(?=\nMiddlename)

The issue of writing multiple fields onto the same collection still stands as stated above.
 

Attachments

  • regex.PNG
    17.4 KB · Views: 154
Last edited:

mdhazlee

New Member
Hi @imstefano , I've finally managed to solve the issue of writing multiple fields onto the same collection. Assuming that the input is as follows,

Firstname: Zhi
Middlename: N/A
Lastname: Tan

the following regex code can be used:

Code:
(?<=Firstname\:\s)(?<Firstname>.*)\nMiddlename\:\s(?<Middlename>.*)\nLastname\:\s(?<Lastname>.*)

This will write the output onto 3 rows within the collection.
 

mdhazlee

New Member
No problem @imstefano. I think I'm getting the hang of regex. I've been exploring regex and found out that it can even extract the middle of the text if required as well. For e.g., if input text is as follows

Firstname: Zhi
Middlename: N/A
Lastname: Tan
Gender: male
Title: Mr.
Birthday: 2/19/1972

And we only want to extract the information from Middlename to Gender, this can be done using the following code:

Code:
(?<=Middlename\:\s)(?<Middlename>.*)\n\w*\:\s(?<Lastname>.*)\n\w*\:\s(?<Gender>.*)(?=\nTitle\:\s)"

By using the lookahead (?=\nTitle\:\s) and lookbehind (?<MIddlename\:\s) to specify the area of text we want to match the pattern from (without including those pattern match), we can further refine the match such that only the required info is output instead of working through all the text. Once again, thank you for first showing me the way to output to collection. You've been a great help!
 

xmishaniagx

New Member
Hi All,

this is very interesting, as I am new RegEx. any help in how I can identify Alphanumeric values that might appear anywhere within a sentence. For example, I am trying to extract the following value "abc123" from a sentence currently team abc123 is playing until 19:00. So the goal is to ignore any full text or full numeric value and just identify and extract the value that is a combination of the two (alphanumeric). TIA.
 
Top