You can't average percentiles - it's not mathematically meaningful, so your number will move depending on random details. E.g., if you changed your load balancing algorithm, and then observed the number go down, you can't conclude that latency was improved, because you changed how the requests are aggregated and it affects the result in random directions.
(You may get a usable number if all servers are more-or-less equivalent and events are randomly distributed, but then you're basically assuming the thing you want to validate.)
I'm not an expert but I know two correct ways: collect the records of all individual requests and compute 90% just once on the whole set (doable if there aren't too many requests - modern machines are quite powerful), or generate per-server histograms which can be merged safely.
(You may get a usable number if all servers are more-or-less equivalent and events are randomly distributed, but then you're basically assuming the thing you want to validate.)
I'm not an expert but I know two correct ways: collect the records of all individual requests and compute 90% just once on the whole set (doable if there aren't too many requests - modern machines are quite powerful), or generate per-server histograms which can be merged safely.