Agri-R1：基于强化学习的视觉语言模型通用农业推理能力增强 (Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement Learning)

from arxiv, This paper is submitted for review to ACL 2026. It is 17 pages long and includes 5 figures. The corresponding authors are Tao Fang and Lina Lu

Agricultural disease diagnosis challenges VLMs, as conventional fine-tuning requires extensive labels, lacks interpretability, and generalizes poorly. While reasoning improves model robustness, existing methods rely on costly expert annotations and rarely address the open-ended, diverse nature of agricultural queries. To address these limitations, we propose \textbf{Agri-R1}, a reasoning-enhanced large model for agriculture. Our framework automates high-quality reasoning data generation via vision-language synthesis and LLM-based filtering, using only 19\% of available samples. Training employs Group Relative Policy Optimization (GRPO) with a novel proposed reward function that integrates domain-specific lexicons and fuzzy matching to assess both correctness and linguistic flexibility in open-ended responses. Evaluated on CDDMBench, our resulting 3B-parameter model achieves performance competitive with 7B- to 13B-parameter baselines, showing a +23.2\% relative gain in disease recognition accuracy, +33.3\% in agricultural knowledge QA, and a +26.10-point improvement in cross-domain generalization over standard fine-tuning. Ablation studies confirm that the synergy between structured reasoning data and GRPO-driven exploration underpins these gains, with benefits scaling as question complexity increases.

翻译：农业病害诊断对视觉语言模型（VLM）构成挑战，因为传统的微调方法需要大量标注数据、缺乏可解释性且泛化能力不足。尽管推理能力能提升模型鲁棒性，但现有方法依赖成本高昂的专家标注，且很少能应对农业领域开放、多样化的查询需求。为解决这些局限，我们提出\textbf{Agri-R1}——一种面向农业的推理增强大模型。该框架通过视觉语言合成与基于大语言模型（LLM）的过滤机制，仅使用19\%的可用样本即可自动生成高质量推理数据。训练过程采用分组相对策略优化（GRPO）及我们新提出的奖励函数，该函数融合领域专用词典与模糊匹配技术，以评估开放域回答的正确性与语言灵活性。在CDDMBench上的评估结果显示，我们最终得到的30亿参数模型取得了与70亿至130亿参数基线模型相当的性能：病害识别准确率相对提升+23.2\%，农业知识问答准确率提升+33.3\%，跨领域泛化能力较标准微调方法提高26.10分。消融实验证实，结构化推理数据与GRPO驱动的探索机制之间的协同作用是性能提升的关键，且该优势随问题复杂度增加而扩大。