I realized the same thing when trying to compute F1-optimized thresholds; scanning through all possible threshold scores is O(n^2) but computing f1 score inline while scanning through the sorted array is O(n).
Interesting! Thanks for the write-up. In the field of Biometrics we often use the library BOB to compute scores like AUC's and EER's, or other measures over stuff like DET curves.
For some practical work I was doing, BOB (and SK learn) proved too slow for my liking.
I am currently unable to provide much more detail, but I used a similar insight to yours (mine was the amortization of sorting costs to have a better average time complexity) in a library I built for calculating empirical, bootstrapped, confidence intervals for EER's.
For massive amounts of decision scores, like millions, this method can give you confidence intervals on EER's where no other library currently in use can. Unfortunately, I have not yet found a theoretical basis for this method, and suspect it might actually break down under some trivial smaller cases. Also, DET curves and their derived measures remain a tough cookie to crack.
Considering AUC's and EER's are somewhat tightly related, I'm thinking this method could also be applied to AUC's.
https://www.moderndescartes.com/essays/auc_intuition/ is my writeup on why the sorting approach is the most economical way to approach this sort of problem