Issues Reading XML

pinkyx

New Member
Hi,

I'm trying to read XML documents from the UK Public Procurement Notices using the Utility - XML object from the Blue Prism website
example link to the XML docs https://data.gov.uk/dataset/63917d4...97ce7a/uk-public-procurement-notices-may-2021

I can get it to read file into a data object but from there I'm struggling to use any of the pages to get the headers and data I need.
What I would like is for each document within that notices folder to be written to a collection or something I can create a spreadsheet with. Has anyone got any experience doing this or guidance they can give me?
 

sahil_raina_91

Active Member
Try to explain the question again.
Are you looking to get a collection of all file names, inside the zip ?
Are you looking to get a collection of each file's entire contents, inside the zip ?
Are you looking to get a collection with only specific XML data extracted ?
 

pinkyx

New Member
Try to explain the question again.
Are you looking to get a collection of all file names, inside the zip ?
Are you looking to get a collection of each file's entire contents, inside the zip ?
Are you looking to get a collection with only specific XML data extracted ?
From the website I linked I can download and extract a zip folder

Inside the folder there can be any number of files.

I want to loop through the files picking out specific bits of data.

For example
<CPV MAIN>12345678</CPV MAIN>
I would want to pull out 12345678

From the xml object I cant seem to get that. It either comes out cutting off the first characters from the value or it comes out with the </CPV MAIN> plus other tags
 

sahil_raina_91

Active Member
From the website I linked I can download and extract a zip folder

Inside the folder there can be any number of files.

I want to loop through the files picking out specific bits of data.

For example
<CPV MAIN>12345678</CPV MAIN>
I would want to pull out 12345678

From the xml object I cant seem to get that. It either comes out cutting off the first characters from the value or it comes out with the </CPV MAIN> plus other tags
I couldn't find a single xml file containing <CPV MAIN>12345678</CPV MAIN>.

So, with limited information, here's what you can try:
Use Regex to extract the specific value : (?<=<CPV MAIN>).*(?=<\/CPV MAIN>) which will return 12345678
 
Top