This is exactly how you'd write the feature, by hand, if you were implementing the language.
That the optimizer could, but does not, do as much optimization as it theoretically can, means that it has more work to do. But that's different than the feature being written in a sub-optimal way.
I don't know if you edited your comment, or if it was just my pre-coffee reading comprehension missed the part "because the compiler isn’t always removing bounds checks that are provably true."
Or are you saying that the current implementation fails to live up to the "zero cost abstraction" goal?