Robust MultiSpecies Agricultural Segmentation Across Devices, Seasons, and Sensors Using Hierarchical DINOv2 Models

Artzai Picon,Itziar Eguskiza,Daniel Mugica,Javier Romero,Carlos Javier Jimenez,Eric White,Gabriel Do-Lago-Junqueira,Christian Klukas,Ramon Navarra-Mestre

Reliable plant species and damage segmentation for herbicide field research trials requires models that can withstand substantial real-world variation across seasons, geographies, devices, and sensing modalities. Most deep learning approaches trained on controlled datasets fail to generalize under these domain shifts, limiting their suitability for operational phenotyping pipelines. This study evaluates a segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference to improve robustness across heterogeneous agricultural conditions. We train on a large, multi-year dataset collected in Germany and Spain (2018-2020), comprising 14 plant species and 4 herbicide damage classes, and assess generalization under increasingly challenging shifts: temporal and device changes (2023), geographic transfer to the United States, and extreme sensor shift to drone imagery (2024). Results show that the foundation-model backbone consistently outperforms prior baselines, improving species-level F1 from 0.52 to 0.87 on in-distribution data and maintaining significant advantages under moderate (0.77 vs. 0.24) and extreme (0.44 vs. 0.14) shift conditions. Hierarchical inference provides an additional layer of robustness, enabling meaningful predictions even when fine-grained species classification degrades (family F1: 0.68, class F1: 0.88 on aerial imagery). Error analysis reveals that failures under severe shift stem primarily from vegetation-soil confusion, suggesting that taxonomic distinctions remain preserved despite background and viewpoint variability. The system is now deployed within BASF's phenotyping workflow for herbicide research trials across multiple regions, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.

翻译：在除草剂田间试验研究中，实现可靠的植物物种与损害分割需要模型能够承受跨季节、地理区域、设备及传感模态的显著现实世界变化。大多数基于受控数据集训练的深度学习方法难以在此类域偏移下保持泛化能力，限制了其在操作性表型分析流程中的适用性。本研究评估了一种整合视觉基础模型（DINOv2）与分层分类推断的分割框架，以提升其在异质性农业条件下的鲁棒性。我们使用在德国与西班牙（2018-2020年）收集的大型多年数据集进行训练，涵盖14种植物物种和4类除草剂损害，并评估其在日益严峻的偏移场景下的泛化能力：时间与设备变化（2023年）、向美国的地理迁移，以及向无人机影像的极端传感器偏移（2024年）。结果表明，基于基础模型的骨干网络持续优于现有基线方法，在分布内数据上将物种级F1分数从0.52提升至0.87，并在中度（0.77对比0.24）与极端（0.44对比0.14）偏移条件下保持显著优势。分层推断提供了额外的鲁棒性层，即使在细粒度物种分类性能下降时仍能提供有意义的预测（无人机影像上科级F1：0.68，纲级F1：0.88）。误差分析表明，严重偏移下的失败主要源于植被-土壤混淆，这提示尽管背景与视角存在变异，分类学区分特征仍得以保持。该系统目前已部署于巴斯夫公司的除草剂研究试验表型分析工作流中，应用于多个地区，展示了将基础模型与结构化生物分类层级相结合以实现可扩展、抗偏移的农业监测的实践可行性。