People screw up the bcrypt thing all the time. Pick a single threaded server stack (and run on one core, because Kubernetes), then configure bcrypt so brute forcing 8 character passwords is slow on an A100. Configure kubernetes to run on a medium range CPU because you have no load. Finally, leave your cloud provider's HTTP proxy's timeout set to default.
The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.
But at least the architecture I described "scales".
Fond memories of a job circa 2013 on a very large Rails app where CI times were sped up by a factor of 10 when someone realized bcrypt was misconfigured when running tests and slowing things down every time a user was created through a factory.
"because Kubernetes"? Is this assuming that you're running your server inside of a Kubernetes instance (and if so, is Kubernetes going to have problems with more than one thread?), or is there some other reason why it comes into this?
The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.
But at least the architecture I described "scales".