Behind in what dimension? The most expensive Nvidia chips are much faster than Google TPUs, but the Google TPUs are competitive in terms of end to end training costs (roughly you can think of this as FLOPs per dollar).
I use TPUs on Colab all the time and I'm freaking happy. Maybe there's a way to use H100's do the same thing, but my code is already written to host utterly massive files on gcs buckets as tfrecord files and load them up on TPUs during the training process. First started with the free ones and now I rent the newer ones because it's not that expensive. I recommend beginners try it out. I find the other architectures more expensive, at least for my use case.