Surface automation: Blue Prism cannot recognise Fonts.

#1
Hi all,

I am trying to read text from a region using "Recognise text" action, I have turned off "font smoothing" but Blue Prism won't recognise the font. I have tried to manually identify the fonts but the size and font combinations are too many. Please suggest a solution, in case I need to identify the fonts manually please advise how to shortlist the candidate fonts.
 
#3
No I am not able to get the text using either "recognise text" and "read text with OCR". Even Fonts are not being recognised by BP. I identified the font manually, even then I am not able to read the text.

I am working on PDF from Blue Prism Portal, this is what I did:
1) Spied the whole window.
2) Marked the regions.
3) Tried to recognise the font but failed> manually identified the font but can't get any result.
This is how I am doing it:
1)Launch the application(I am opening the pdf using internet explorer as I was not able to spy Acrobat reader.
2)Activate the application.
3) Read/recognise the text and store it in a variable.( I have also tried to define the background color but it still does not works)

The process just ends, no error is thrown and nothing is reflected in data item.
 

VJR

Well-Known Member
Staff member
#4
This did happen to me several times while working with pdfs. No data in the data item and no errors too. In this case I never had to use any information related to fonts but only Read text with ocr. But the main thing is to correctly mark the Region - all the left, right, top, bottom borders surrounding the Text must be marked properly. Sometimes even if the text is 10 characters, the region must be extended to the right to accommodate 15 characters for example. So you need to do some back and forth research on it to capture the text correctly.
 
#7
H
This did happen to me several times while working with pdfs. No data in the data item and no errors too. In this case I never had to use any information related to fonts but only Read text with ocr. But the main thing is to correctly mark the Region - all the left, right, top, bottom borders surrounding the Text must be marked properly. Sometimes even if the text is 10 characters, the region must be extended to the right to accommodate 15 characters for example. So you need to do some back and forth research on it to capture the text correctly.
I tried things as you suggested and even got a string once, but that didn't make any sense, just some random characters. Is there any other way this could be done?
 

VJR

Well-Known Member
Staff member
#8
H

I tried things as you suggested and even got a string once, but that didn't make any sense, just some random characters. Is there any other way this could be done?
The only way that has worked with me every single time is the one that I have suggested you above. Looks like the text you have has some surrounding design or some pattern that makes the character unreadable. Is this webpage or image shareable and available somewhere on the web to take a look at?
 

VJR

Well-Known Member
Staff member
#10
It's a pdf file named "Create quotes" downloadable from, I was trying to read headlines. I don't think there would be an underlying or overlapping element to it. Here are the snapshots:
Hi growler, Is this webpage/pdf shareable and available somewhere on the web to take a look at?
 
#11
It's called " BP Travel - Create Quotes- Initial process Assessment" it can be downloaded from blue prism's portal. I would have shared the document here but don't know if it will be against the forum policy. I will mail you if you could message me the email id.
 

VJR

Well-Known Member
Staff member
#12
Nope, don't share it here or over email. I found it after you pointed the details. Are you looking to fetch the headlines that says "Initial Process Analysis - Create Quotes"? I haven't tried it with Blue Prism yet, but I am unable to select it even manually with the mouse too.
 
#13
Nope, don't share it here or over email. I found it after you pointed the details. Are you looking to fetch the headlines that says "Initial Process Analysis - Create Quotes"? I haven't tried it with Blue Prism yet, but I am unable to select it even manually with the mouse too.
I was trying to read sub headlines i.e. "Contact Details", "Process Requirements" etc. But the issue that you mentioned is strange, because it is a readable PDF, I can select and copy the text here. If it's for any help I am not using Acrobat reader and I completely forgot to mention this before but I am using browser(IE) to open the PDF. I hope it wasn't silly and won't matter here.
 

VJR

Well-Known Member
Staff member
#15
Hi growler,

I downloaded the document and since I do not have Adobe on the machine with Blue Prism I use Chrome to open PDF documents. You can try the same in IE, but I have always found success with Chrome for PDFs.

As you can see below the 'Result' data item has successfully read the text as "Contact Details".

1526656997608.png

No major changes were done from what I suggested you above.
- Read Text with OCR.
- No fonts information given
- The only thing I had to do is tick the Ordinal attribute in the Application Modeller for the Region since it was giving the "More than one window matches the query" message.

One thing you may want to do is before the Reader1 stage add one more Reader stage and use 'Read Image' action and store it in an Image data item. After running the process open the data item and see whether the Image correctly shows the area where "Contact Details" is written. If it is not then the spying has not occurred correctly and hence is not reading any text.
 
#17
I cannot get the text using "recognize text" as well. Training document says check using read image to find whether you spied it correctly the region you wanted to extract. The image shows that the coordinate is correct and I can see the entire text. I also changed the property from image to coordinate (as per the document) because I wanted to extract the text.

I have no issue using Read text with OCR - by it name, this applicable to PDF or any optical items though. I am SAT-spying a site with dynamic tabulated data. I thought its fitting to use the recognize text than OCR.

Likewise, I cannot get a single font using identify system font. and if you are lucky, you will get multiple.
 
#18
Did you manage to figure this out?

Im having trouble getting the information from a text box using surface automation as every time you open the window the detail has changed (enquiry number) it spies and highlights if it is blank or using the original number in the box but anything slightly different it wont select it to read the text.
 
#19
This did happen to me several times while working with pdfs. No data in the data item and no errors too. In this case I never had to use any information related to fonts but only Read text with ocr. But the main thing is to correctly mark the Region - all the left, right, top, bottom borders surrounding the Text must be marked properly. Sometimes even if the text is 10 characters, the region must be extended to the right to accommodate 15 characters for example. So you need to do some back and forth research on it to capture the text correctly.
I tried what you have mentioned but Bot reading values differently most of the time. Please check some the examples which we faced so far.

Example :
  • Instead of " GLDLAZD" BOT read as "GLDI_AZD| "
  • Instead of "04CX9.1CT" Bot read as "04CX9. 1 CT" (with spaces) ---- Temporary fix replaced " " (space) with "" (without space)
  • Instead of "LSVMNWWV" Bot read as "LSVM NWWV" --- Temporary fix replaced " " (space) with "" (without space)

Any solution for these kind of issue. Not sure when it is reading what kind of values.
 
#20
I tried what you have mentioned but Bot reading values differently most of the time. Please check some the examples which we faced so far.

Example :
  • Instead of " GLDLAZD" BOT read as "GLDI_AZD| "
  • Instead of "04CX9.1CT" Bot read as "04CX9. 1 CT" (with spaces) ---- Temporary fix replaced " " (space) with "" (without space)
  • Instead of "LSVMNWWV" Bot read as "LSVM NWWV" --- Temporary fix replaced " " (space) with "" (without space)

Any solution for these kind of issue. Not sure when it is reading what kind of values.
Hi, were u able to find a solution for this issue?
 
Top