Deep learning-based ecological analysis of camera trap images is impacted by training data quality and quantity

Peggy A. Bevan,Omiros Pantazis,Holly Pringle,Guilherme Braga Ferreira,Daniel J. Ingram,Emily Madsen,Liam Thomas,Dol Raj Thanet,Thakur Silwal,Santosh Rayamajhi,Gabriel Brostow,Oisin Mac Aodha,Kate E. Jones

from arxiv, Peggy A. Bevan, Omiros Pantazis: equally contributing authors. Published in Remote Sensing in Ecology and Conservation

Large image collections generated from camera traps offer valuable insights into species richness, occupancy, and activity patterns, significantly aiding biodiversity monitoring. However, the manual processing of these datasets is time-consuming, hindering analytical processes. To address this, deep neural networks have been widely adopted to automate image labelling, but the impact of classification error on key ecological metrics remains unclear. Here, we analyse data from camera trap collections in an African savannah (82,300 labelled images, 47 species) and an Asian sub-tropical dry forest (40,308 labelled images, 29 species) to compare ecological metrics derived from expert-generated species identifications with those generated by deep learning classification models. We specifically assess the impact of deep learning model architecture, proportion of label noise in the training data, and the size of the training dataset on three key ecological metrics: species richness, occupancy, and activity patterns. We found that predictions of species richness derived from deep neural networks closely match those calculated from expert labels and remained resilient to up to 10% noise in the training dataset (mis-labelled images) and a 50% reduction in the training dataset size. We found that our choice of deep learning model architecture (ResNet vs ConvNext-T) or depth (ResNet18, 50, 101) did not impact predicted ecological metrics. In contrast, species-specific metrics were more sensitive; less common and visually similar species were disproportionately affected by a reduction in deep neural network accuracy, with consequences for occupancy and diel activity pattern estimates. To ensure the reliability of their findings, practitioners should prioritize creating large, clean training sets and account for class imbalance across species over exploring numerous deep learning model architectures.

翻译：相机陷阱产生的大规模图像集为物种丰富度、占据率和活动模式提供了宝贵洞见，极大地促进了生物多样性监测。然而，这些数据集的人工处理耗时费力，阻碍了分析流程。为此，深度神经网络已被广泛采用以实现图像标注自动化，但分类误差对关键生态指标的影响仍不明确。本研究分析了非洲热带草原（82,300张标注图像，47个物种）和亚洲亚热带干燥森林（40,308张标注图像，29个物种）的相机陷阱数据集，比较了基于专家物种鉴定与基于深度学习分类模型生成的生态指标。我们重点评估了深度学习模型架构、训练数据中标签噪声的比例以及训练数据集规模对三个关键生态指标的影响：物种丰富度、占据率和活动模式。研究发现，深度神经网络预测的物种丰富度与基于专家标签计算的结果高度吻合，并且在训练数据集中存在高达10%的噪声（误标图像）以及训练数据集规模减少50%的情况下仍保持稳健。我们发现深度学习模型架构的选择（ResNet与ConvNext-T）或深度（ResNet18、50、101）不影响预测的生态指标。相比之下，物种特异性指标更为敏感：较不常见和视觉相似的物种会因深度神经网络准确性的下降而受到不成比例的影响，进而影响占据率和昼夜活动模式的估计。为确保研究结果的可靠性，实践者应优先创建大规模、高质量的训练集，并考虑物种间的类别不平衡问题，而非过度探索多种深度学习模型架构。