They do frame selection first, then create a 3D scene from the fewer frames. So ...

They do frame selection first, then create a 3D scene from the fewer frames. So that light pole might not have been in enough of the chosen frames to get a good 3D model of it. And the first stage has a pretty low-res 3D model because the number of points just gets crazy, so the light pole probably wouldn't have been modeled anyway.