What metric is being used when they give a figure for "reconstruction" time? As far as I understand the purpose of these nets is to generate novel views, which can be done in anywhere from dozens of seconds (original NeRF paper) to 200FPS (FastNeRF). So where does "half an hour" come in? If that's training time for each novel scene, that's very impressive - but how long to generate each novel view?
I think reconstruction time isn't about creating a new perspective but about the time that it takes to train the whole model given the images (taken from various viewpoints). As I understand it this took quite a while to train in the original nerf paper but here they seem to have reduced it quite a lot.