Yes, the root cause analysis feels off. (Though the cause might be as simple as the cost of walking through the message queue.)
Something is definitely blocking or resource constrained and causing "thrashing": the (uncontrolled) number of requests allowed to spin up at a time (which creates resource contention) combined with the fan out (1 request = 100+ S3 requests/callbacks) seem like a likely causal factor. As you said, a worker approach (with limits on the number of concurrent requests) is going to be similar to the golang approach used.
The golang approach makes the average execution time of a given request more consistent but the overall wait time may still increase (dramatically) if the arrival rate grows too high. (Classic problem).
Say "easily" fixable by adding servers? Partly true. What happens if the S3 calls slows down dramatically?
Something is definitely blocking or resource constrained and causing "thrashing": the (uncontrolled) number of requests allowed to spin up at a time (which creates resource contention) combined with the fan out (1 request = 100+ S3 requests/callbacks) seem like a likely causal factor. As you said, a worker approach (with limits on the number of concurrent requests) is going to be similar to the golang approach used.
The golang approach makes the average execution time of a given request more consistent but the overall wait time may still increase (dramatically) if the arrival rate grows too high. (Classic problem).
Say "easily" fixable by adding servers? Partly true. What happens if the S3 calls slows down dramatically?