As far as I can tell, the article notably never defines what "slices" are or wha...

rwitten · on Nov 10, 2023

Great questions! Slices are a set of TPU chips that share a fast, private inter-chip-interconnect. Unlike the current GPU generation in clouds, the TPUs on different machines can communicate through this private network. Multislice means that we're using a hierarchical network, where there is both inter-chip-interconnect and normal data-center netowrking.

More details: https://cloud.google.com/tpu/docs/multislice-introduction

(P.S. - contributor on blog post, Google employee, all thoughts my own)

smarterclayton · on Nov 10, 2023

Also, I should point out that a set of machines hosting TPUs is referred to as a "pod", which is not the same thing as a Kubernetes pod (also referenced in this doc).

The term "pod" originated in early data center design and occasionally crosses over from HPC to broad use - i.e. nVidia calls the set of DGX machines a "pod" https://blogs.nvidia.com/blog/2021/03/05/what-is-a-cluster-p....

Kubernetes chose "pod" to represent a set of co-scheduled containers, like a "pod of whales". Other systems like Mesos and Google's Borg https://storage.googleapis.com/pub-tools-public-publication-... use "task" to refer to a single container but didn't have a concept for heterogenous co-scheduled tasks at the time.

Somewhat ironically, it now means TPUs on GKE are confusing because we have TPUs hosts organized into "pods", and "pods" for the software using the TPUs.

A Kubernetes pod using a TPU lands on a host which is part of a slice of a TPU pod.

jeffbee · on Nov 10, 2023

As your second link mentions in section 2.4, Borg has "allocs" which are basically pods.

smarterclayton · on Nov 10, 2023

True. The pod’s monotonic and atomic lifecycle across containers is a significant difference, but you can broadly accomplish similar behaviors with an alloc for sharing resources.