Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)

Caribou across the Arctic has declined in recent decades, motivating scalable and accurate monitoring approaches to guide evidence-based conservation actions and policy decisions. Manual interpretation from this imagery is labor-intensive and error-prone, underscoring the need for automatic and reliable detection across varying scenes. Yet, such automatic detection is challenging due to severe background heterogeneity, dominant empty terrain (class imbalance), small or occluded targets, and wide variation in density and scale. To make the detection model (HerdNet) more robust to these challenges, a weakly supervised patch-level pretraining based on a detection network's architecture is proposed. The detection dataset includes five caribou herds distributed across Alaska. By learning from empty vs. non-empty labels in this dataset, the approach produces early weakly supervised knowledge for enhanced detection compared to HerdNet, which is initialized from generic weights. Accordingly, the patch-based pretrain network attained high accuracy on multi-herd imagery (2017) and on an independent year's (2019) test sets (F1: 93.7%/92.6%, respectively), enabling reliable mapping of regions containing animals to facilitate manual counting on large aerial imagery. Transferred to detection, initialization from weakly supervised pretraining yielded consistent gains over ImageNet weights on both positive patches (F1: 92.6%/93.5% vs. 89.3%/88.6%), and full-image counting (F1: 95.5%/93.3% vs. 91.5%/90.4%). Remaining limitations are false positives from animal-like background clutter and false negatives related to low animal density occlusions. Overall, pretraining on coarse labels prior to detection makes it possible to rely on weakly-supervised pretrained weights even when labeled data are limited, achieving results comparable to generic-weight initialization.

翻译：近几十年来，北极地区驯鹿数量持续下降，亟需开发可扩展且精确的监测方法，以指导基于证据的保护行动和政策决策。对此类影像进行人工判读不仅劳动强度大且易出错，凸显了在不同场景下实现自动化可靠检测的必要性。然而，由于严重的背景异质性、占主导地位的空旷地形（类别不平衡）、目标尺寸小或被遮挡，以及密度和尺度的广泛变化，此类自动检测面临巨大挑战。为使检测模型（HerdNet）对这些挑战更具鲁棒性，本文提出一种基于检测网络架构的弱监督图像块级预训练方法。检测数据集涵盖阿拉斯加地区分布的五个驯鹿种群。通过从该数据集的空与非空标签中学习，相较于使用通用权重初始化的HerdNet，该方法可生成早期弱监督知识以增强检测性能。基于图像块的预训练网络在多种群影像（2017年）及独立年份（2019年）测试集上均取得高精度（F1分数分别为93.7%/92.6%），实现了对动物分布区域的可靠标绘，从而促进对大范围航空影像的人工计数。迁移至检测任务时，弱监督预训练初始化相较于ImageNet权重在阳性图像块（F1：92.6%/93.5% vs. 89.3%/88.6%）和全图像计数（F1：95.5%/93.3% vs. 91.5%/90.4%）上均获得持续提升。当前局限主要来自类动物背景杂波导致的误检，以及低动物密度遮挡造成的漏检。总体而言，在检测前使用粗粒度标签进行预训练，使得即使在标注数据有限的情况下也能依赖弱监督预训练权重，获得与通用权重初始化相当的结果。