Depth estimation and 3D reconstruction have been extensively studied as core topics in computer vision. Starting from rigid objects with relatively simple geometric shapes, such as vehicles, the research has expanded to address general objects, including challenging deformable objects, such as humans and animals. However, for the animal, in particular, the majority of existing models are trained based on datasets without metric scale, which can help validate image-only models. To address this limitation, we present WildDepth, a multimodal dataset and benchmark suite for depth estimation, behavior detection, and 3D reconstruction from diverse categories of animals ranging from domestic to wild environments with synchronized RGB and LiDAR. Experimental results show that the use of multi-modal data improves depth reliability by up to 10% RMSE, while RGB-LiDAR fusion enhances 3D reconstruction fidelity by 12% in Chamfer distance. By releasing WildDepth and its benchmarks, we aim to foster robust multimodal perception systems that generalize across domains.
翻译:深度估计与三维重建作为计算机视觉的核心课题已被广泛研究。从具有相对简单几何形状的刚体对象(如车辆)起步,该领域的研究已扩展至包括人类与动物等具有挑战性的可变形对象在内的通用对象。然而,针对动物而言,现有的大多数模型均基于缺乏度量尺度的数据集进行训练,这类数据集虽有助于验证纯图像模型,但存在局限性。为突破这一限制,我们提出WildDepth——一个面向深度估计、行为检测及三维重建的多模态数据集与基准套件,涵盖从家养环境到野外环境多种动物类别,并配备同步的RGB与LiDAR数据。实验结果表明,使用多模态数据可将深度估计的均方根误差(RMSE)降低最多10%,而RGB-LiDAR融合可将三维重建的倒角距离保真度提升12%。通过发布WildDepth及其基准,我们旨在推动具有跨领域泛化能力的鲁棒多模态感知系统的发展。