And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.
So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.
Say you need a week to train this, so $200K-$800K...
You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.
Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.
And to make things simple, let's do it all in FP16 because INT8 on Volta ~= 1/2 a first generation TPU, but FP16 ~= 3 first generation TPUs at INT8 (sad, right?), an accident that occurred because P100 didn't support INT8, but consumer variants did.
So, 5,064/3 = 1,688 Volta GPUs ~= $5000 per hour, probably half that reserved, a quarter of that in spot.
Say you need a week to train this, so $200K-$800K...
You can buy DGX-1Vs off-label for about $75K. Say they costs $20K annually to host. Say you use them for 3 years, so total TCO is ~135K, which comes down to $0.64/hour.
Conclusion: p3.8xl spot instances are currently a steal! But I don't have ~$200K burning a hole in my pocket, so I guess I'm out of luck.