We present a stereo-matching method for depth estimation from high-resolution images using visual hulls as priors, and a memory-efficient technique for the correlation computation. Our method uses object masks extracted from supplementary views of the scene to guide the disparity estimation, effectively reducing the search space for matches. This approach is specifically tailored to stereo rigs in volumetric capture systems, where an accurate depth plays a key role in the downstream reconstruction task. To enable training and regression at high resolutions targeted by recent systems, our approach extends a sparse correlation computation into a hybrid sparse-dense scheme suitable for application in leading recurrent network architectures. We evaluate the performance-efficiency trade-off of our method compared to state-of-the-art methods, and demonstrate the efficacy of the visual hull guidance. In addition, we propose a training scheme for a further reduction of memory requirements during optimization, facilitating training on high-resolution data.
翻译:本文提出一种利用视觉外壳作为先验的高分辨率图像深度估计立体匹配方法,以及一种内存高效的关联计算技术。该方法通过从场景补充视角提取物体掩码来指导视差估计,有效缩减匹配搜索空间。此方法专门针对容积捕获系统中的立体装置设计,在这些系统中精确深度对下游重建任务至关重要。为实现当前系统所需的高分辨率训练与回归,本方法将稀疏关联计算扩展为适用于主流循环网络架构的混合稀疏-稠密方案。我们评估了本方法在性能与效率权衡方面相较于前沿方法的优势,并验证了视觉外壳引导机制的有效性。此外,我们提出一种优化过程中进一步降低内存需求的训练方案,为高分辨率数据训练提供便利。