Machine learning models in agricultural vision often achieve high accuracy on curated datasets but fail to generalize under real field conditions due to distribution shifts between training and deployment environments. Moreover, most machine learning competitions focus primarily on model design while treating datasets as fixed resources, leaving the role of data collection practices in model generalization largely unexplored. We introduce the AgrI Challenge, a data-centric competition framework in which multiple teams independently collect field datasets, producing a heterogeneous multi-source benchmark that reflects realistic variability in acquisition conditions. To systematically evaluate cross-domain generalization across independently collected datasets, we propose Cross-Team Validation (CTV), an evaluation paradigm that treats each team's dataset as a distinct domain. CTV includes two complementary protocols: Train-on-One-Team-Only (TOTO), which measures single-source generalization, and Leave-One-Team-Out (LOTO), which evaluates collaborative multi-source training. Experiments reveal substantial generalization gaps under single-source training: models achieve near-perfect validation accuracy yet exhibit validation-test gaps of up to 16.20% (DenseNet121) and 11.37% (Swin Transformer) when evaluated on datasets collected by other teams. In contrast, collaborative multi-source training dramatically improves robustness, reducing the gap to 2.82% and 1.78%, respectively. The challenge also produced a publicly available dataset of 50,673 field images of six tree species collected by twelve independent teams, providing a diverse benchmark for studying domain shift and data-centric learning in agricultural vision.
翻译:农业视觉中的机器学习模型通常在精选数据集上能达到很高的准确率,但由于训练与部署环境间的分布偏移,在真实田间条件下往往泛化能力不足。此外,大多数机器学习竞赛主要关注模型设计,而将数据集视为固定资源,这使得数据收集实践在模型泛化中的作用在很大程度上未被探索。我们提出了AgrI挑战,这是一个数据中心化的竞赛框架,其中多个团队独立收集田间数据集,从而构建了一个反映采集条件真实多样性的异构多源基准。为了系统评估跨独立收集数据集的跨域泛化能力,我们提出了跨团队验证(CTV),这是一种将每个团队的数据集视为独立域进行评估的范式。CTV包含两个互补的协议:仅在单团队上训练(TOTO),用于衡量单源泛化能力;以及留一团队出局(LOTO),用于评估协作多源训练。实验揭示了单源训练下显著的泛化差距:模型在自身数据集上验证准确率接近完美,但在其他团队收集的数据集上进行评估时,验证-测试差距高达16.20%(DenseNet121)和11.37%(Swin Transformer)。相比之下,协作多源训练显著提升了鲁棒性,将差距分别降低至2.82%和1.78%。该挑战还产生了一个公开可用的数据集,包含由十二个独立团队收集的六种树种的50,673张田间图像,为研究农业视觉中的域偏移和以数据为中心的学习提供了一个多样化的基准。