They don't see flat static images, but a continuous stream of input that changes view angle constantly as they and the object make subtle movements. Moreover, they can interact with things and gather more visual information where needed. (Anything too big to interact directly with is probably too far away for binocular depth perception to be of much use.) See a big list of monocular depth cues here: https://en.wikipedia.org/wiki/Depth_perception