RPA Automation Failure -- accpetable levels???

Hello all. Just wanted to start a general discussion about what is an acceptable failure rate? We have a process that connects to a web site and then runs several high level operations inside of that we site. On occasion, primarily during report generation, the report runs long so something else happens and that step of the operation fails. despite the redundancies built into the process. Additionally, the whole process will successfully complete, we will just at time end up missing a report. The user can replicate this failed step and generate the report themselves, but they find this to be an annoyance. As I measure the failure rate I am curious what others call a successful rate of automation run time and what is an expected level of failure.

In general this just feels like an uphill battle as we often don't have the same control as you would in building a C#/.net application since we are relying on applications like web sites to respond to commands in the same general amount of time.

Just looking for ideas as we continue to refine what acceptable SLAs should be.
Can't you program in more robustness? Put in a reasonable wait time for the page to load, then run an if-then-else set of commands to check if the results of the web page are present, if they are, continue to the next step, if not, program a wait time and then run the loop again.

It a lot of extra effort to program in robustness, but it's necessary