Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of pretrained models on both synthetic and real-world data. Our experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model or modifying the network architecture.
翻译:计算机视觉的深度估计方法通常使用理想化光学的简单相机模型。对于现代机器学习方法,在尝试使用模拟数据训练深度网络时,这会产生问题,尤其是对于像“从焦深中提取深度”这样对焦点敏感的任务。在此工作中,我们研究了由离轴像差引起的域差距,这种像差会影响焦点堆栈中最佳焦点帧的决策。接着,我们探索通过像差感知训练(AAT)来弥合这一域差距。我们的方法涉及一个轻量级网络,该网络模拟不同位置和焦点距离处的镜头像差,然后将其集成到常规的网络训练流程中。我们在合成数据和真实世界数据上评估预训练模型的通用性。我们的实验结果表明,所提出的AAT方案可以在不微调模型或修改网络架构的情况下提高深度估计的准确性。