So 3 2nd generation TPUs are ~= 1 Volta class GPU ~= $3 per hour on-demand on AW...

So 3 2nd generation TPUs are ~= 1 Volta class GPU ~= $3 per hour on-demand on AWS: https://aws.amazon.com/ec2/pricing/on-demand/ and ~$1 (75 cents at the moment with p3.8xlarge and its 4 GPUs) in spot: https://aws.amazon.com/ec2/spot/pricing/ if you take the time to build a robust framework.

And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.

So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.

Say you need a week to train this, so $200K-$800K...

You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.

Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.