Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of pretrained models on both synthetic and real-world data. Our experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model or modifying the network architecture.
翻译:用于深度估计的计算机视觉方法通常采用具有理想化光路的简单相机模型。对于现代机器学习方法而言,这在使用模拟数据训练深度网络时会产生问题,尤其是在聚焦深度估计等对焦点敏感的视觉任务中。本文研究了离轴像差造成的领域鸿沟——这种像差会影响焦堆栈中最清晰帧的判定,进而探讨通过畸变感知训练(AAT)架设领域桥梁的可行性。我们提出一种轻量级网络,用于模拟不同位置和焦距下的镜头像差,并将其集成到传统网络训练流程中。通过在合成数据与真实数据上评估预训练模型的泛化能力,实验结果表明,无需微调模型或修改网络架构,所提出的AAT方案即可提升深度估计精度。