You should apply and use OpenAI on azure. We’ve got close to 1m tokens per minut...

You should apply and use OpenAI on azure. We’ve got close to 1m tokens per minute capacity across 3 instances and the latency is totally fine, like 800ms average (with big prompts). They’ve just got the new 0613 models as well (they seem to be about 2 weeks behind OpenAI). We’ve been in production for about 3 months, have some massive clients with a lot traffic and our gpt bill is way under £100 per month. This is all 3.5 turbo though, not 4 (but that’s available on application, but we don’t need it).