Depth estimation and 3D reconstruction have been extensively studied as core topics in computer vision. Starting from rigid objects with relatively simple geometric shapes, such as vehicles, the research has expanded to address general objects, including challenging deformable objects, such as humans and animals. However, for the animal, in particular, the majority of existing models are trained based on datasets without metric scale, which can help validate image-only models. To address this limitation, we present WildDepth, a multimodal dataset and benchmark suite for depth estimation, behavior detection, and 3D reconstruction from diverse categories of animals ranging from domestic to wild environments with synchronized RGB and LiDAR. Experimental results show that the use of multi-modal data improves depth reliability by up to 10% RMSE, while RGB-LiDAR fusion enhances 3D reconstruction fidelity by 12% in Chamfer distance. By releasing WildDepth and its benchmarks, we aim to foster robust multimodal perception systems that generalize across domains.
翻译:深度估计与三维重建作为计算机视觉的核心课题已得到广泛研究。从具有相对简单几何形状的刚性物体(如车辆)起步,研究已扩展到处理包括具有挑战性的可变形物体(如人类与动物)在内的通用对象。然而,针对动物这一特定对象,现有模型大多基于无量纲数据集进行训练,这类数据集虽有助于验证纯图像模型,却存在局限。为突破此限制,我们提出了WildDepth——一个面向深度估计、行为检测及三维重建的多模态数据集与基准测试套件,涵盖从家养到野外环境的多种动物类别,并提供同步的RGB与激光雷达数据。实验结果表明,多模态数据的使用可将深度估计的可靠性提升高达10%(以均方根误差计),而RGB-激光雷达融合技术则使三维重建的保真度在倒角距离指标上提升了12%。通过公开WildDepth数据集及其基准测试,我们旨在推动能够跨领域泛化的鲁棒多模态感知系统的发展。