I realized the same thing when trying to compute F1-optimized thresholds; scanni...

Phemist · on Oct 13, 2020

Interesting! Thanks for the write-up. In the field of Biometrics we often use the library BOB to compute scores like AUC's and EER's, or other measures over stuff like DET curves.

For some practical work I was doing, BOB (and SK learn) proved too slow for my liking. I am currently unable to provide much more detail, but I used a similar insight to yours (mine was the amortization of sorting costs to have a better average time complexity) in a library I built for calculating empirical, bootstrapped, confidence intervals for EER's.

https://github.com/feerci/feerci

For massive amounts of decision scores, like millions, this method can give you confidence intervals on EER's where no other library currently in use can. Unfortunately, I have not yet found a theoretical basis for this method, and suspect it might actually break down under some trivial smaller cases. Also, DET curves and their derived measures remain a tough cookie to crack.

Considering AUC's and EER's are somewhat tightly related, I'm thinking this method could also be applied to AUC's.