Aerial scene understanding systems face stringent payload restrictions and must often rely on monocular depth estimation for modeling scene geometry, which is an inherently ill-posed problem. Moreover, obtaining accurate ground truth data required by learning-based methods raises significant additional challenges in the aerial domain. Self-supervised approaches can bypass this problem, at the cost of providing only up-to-scale results. Similarly, recent supervised solutions which make good progress towards zero-shot generalization also provide only relative depth values. This work presents TanDepth, a practical scale recovery method for obtaining metric depth results from relative estimations at inference-time, irrespective of the type of model generating them. Tailored for Unmanned Aerial Vehicle (UAV) applications, our method leverages sparse measurements from Global Digital Elevation Models (GDEM) by projecting them to the camera view using extrinsic and intrinsic information. An adaptation to the Cloth Simulation Filter is presented, which allows selecting ground points from the estimated depth map to then correlate with the projected reference points. We evaluate and compare our method against alternate scaling methods adapted for UAVs, on a variety of real-world scenes. Considering the limited availability of data for this domain, we construct and release a comprehensive, depth-focused extension to the popular UAVid dataset to further research.
翻译:航空场景理解系统面临严格的载荷限制,通常必须依赖单目深度估计来建模场景几何,这本质上是一个不适定问题。此外,获取基于学习方法所需的精确地面真实数据在航空领域带来了巨大的额外挑战。自监督方法可以绕过这个问题,但代价是只能提供尺度未知的结果。同样,近期在零样本泛化方面取得良好进展的监督解决方案也仅能提供相对深度值。本研究提出了TanDepth,一种实用的尺度恢复方法,用于在推理时从相对深度估计中获取度量深度结果,且不受生成这些估计的模型类型限制。该方法专为无人机应用定制,通过利用外参和内参信息将全球数字高程模型中的稀疏测量值投影到相机视角。我们提出了对布料模拟滤波器的改进,使其能够从估计的深度图中选择地面点,进而与投影的参考点建立关联。我们在多种真实场景中评估并比较了该方法与适用于无人机的其他尺度恢复方法。考虑到该领域数据的有限可用性,我们构建并发布了一个针对深度任务的综合性扩展数据集,作为对流行UAVid数据集的补充,以推动进一步研究。