Click here to flash read.
Recent works on 3D reconstruction from posed images have demonstrated that
direct inference of scene-level 3D geometry without test-time optimization is
feasible using deep neural networks, showing remarkable promise and high
efficiency. However, the reconstructed geometry, typically represented as a 3D
truncated signed distance function (TSDF), is often coarse without fine
geometric details. To address this problem, we propose three effective
solutions for improving the fidelity of inference-based 3D reconstructions. We
first present a resolution-agnostic TSDF supervision strategy to provide the
network with a more accurate learning signal during training, avoiding the
pitfalls of TSDF interpolation seen in previous work. We then introduce a depth
guidance strategy using multi-view depth estimates to enhance the scene
representation and recover more accurate surfaces. Finally, we develop a novel
architecture for the final layers of the network, conditioning the output TSDF
prediction on high-resolution image features in addition to coarse voxel
features, enabling sharper reconstruction of fine details. Our method,
FineRecon, produces smooth and highly accurate reconstructions, showing
significant improvements across multiple depth and 3D reconstruction metrics.
No creative common's license