Implementations using 32- or 64-byte (256 or 512 bit) vector extensions would run afoul of 16-byte granularity. While it is not common yet, ARM SVE allows vector sizes larger than 128 bits -- e.g., Graviton3 has 256-bit SVE and Fujitsu A64FX has 512-bit. (x86 has had 256 and 512 bit vector instructions for some time, but current CHERI development seems to be on ARM.)
I think you might be confusing the tracking of validity of capabilities themselves (which could indeed be at a 16 byte granularity for an otherwise 64-bit system) with the bounds of a capability, which can be as small as 1 byte.