Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.
翻译:生态监测日益依赖视觉模型实现自动化,然而不透明的预测限制了其可信度与实地应用。本文提出一种基于修复引导的扰动解释技术,能够生成保持场景上下文、掩码定位的照片级编辑结果。与掩码或模糊处理不同,这些编辑操作保持数据分布内特性,并揭示在物种识别与性状归因等任务中驱动预测的细粒度形态学线索。我们在专为冰川湾无人机影像中港海豹检测而微调的YOLOv9检测器上验证了该方法,利用Segment-Anything-Model优化的掩码支持两种干预:(i) 目标移除/替换(例如将海豹替换为合理的冰/水或船只),(ii) 将原始动物合成至新场景的背景替换。通过重评分扰动图像(翻转率、置信度下降)及专家对生态合理性与可解释性的评审来评估解释效果。所得解释能定位诊断性结构,避免传统扰动常见的删除伪影,并提供支持专家验证与提升AI在生态学中可信部署的领域相关洞见。