You should apply and use OpenAI on azure. We’ve got close to 1m tokens per minute capacity across 3 instances and the latency is totally fine, like 800ms average (with big prompts). They’ve just got the new 0613 models as well (they seem to be about 2 weeks behind OpenAI). We’ve been in production for about 3 months, have some massive clients with a lot traffic and our gpt bill is way under £100 per month. This is all 3.5 turbo though, not 4 (but that’s available on application, but we don’t need it).