Deep in the Jungle: Towards Automating Chimpanzee Population Estimation

The estimation of abundance and density in unmarked populations of great apes relies on statistical frameworks that require animal-to-camera distance measurements. In practice, acquiring these distances depends on labour-intensive manual interpretation of animal observations across large camera trap video corpora. This study introduces and evaluates an only sparsely explored alternative: the integration of computer vision-based monocular depth estimation (MDE) pipelines directly into ecological camera trap workflows for great ape conservation. Using a real-world dataset of 220 camera trap videos documenting a wild chimpanzee population, we combine two MDE models, Dense Prediction Transformers and Depth Anything, with multiple distance sampling strategies. These components are used to generate detection distance estimates, from which population density and abundance are inferred. Comparative analysis against manually derived ground-truth distances shows that calibrated DPT consistently outperforms Depth Anything. This advantage is observed in both distance estimation accuracy and downstream density and abundance inference. Nevertheless, both models exhibit systematic biases. We show that, given complex forest environments, they tend to overestimate detection distances and consequently underestimate density and abundance relative to conventional manual approaches. We further find that failures in animal detection across distance ranges are a primary factor limiting estimation accuracy. Overall, this work provides a case study that shows MDE-driven camera trap distance sampling is a viable and practical alternative to manual distance estimation. The proposed approach yields population estimates within 22% of those obtained using traditional methods.

翻译：对于未标记类人猿种群的丰度与密度估算，依赖于需要测量动物至相机距离的统计框架。在实际操作中，获取这些距离数据需要对大规模相机陷阱视频库中的动物观测进行劳动密集型的目视判读。本研究引入并评估了一种目前仅被稀疏探索的替代方案：将基于计算机视觉的单目深度估计（MDE）流程直接整合到用于类人猿保护的生态相机陷阱工作流中。利用一个包含220段记录野生黑猩猩种群的相机陷阱视频的真实数据集，我们结合了两种MDE模型——Dense Prediction Transformers 与 Depth Anything——以及多种距离采样策略。这些组件被用于生成探测距离估计值，并据此推断种群密度与丰度。与人工获取的真实距离数据进行的对比分析表明，经过校准的DPT模型在性能上持续优于Depth Anything模型。这一优势体现在距离估计的准确性以及下游的密度与丰度推断中。然而，两种模型均表现出系统性偏差。我们发现，在复杂的森林环境中，它们倾向于高估探测距离，从而导致相对于传统人工方法低估了密度与丰度。我们进一步发现，跨距离范围的动物检测失败是限制估算精度的主要因素。总体而言，这项工作提供了一个案例研究，表明MDE驱动的相机陷阱距离采样是一种可行且实用的手动距离估计替代方案。所提出的方法得到的种群数量估计值，与传统方法获得的结果相比，差异在22%以内。