Agricultural disease diagnosis challenges VLMs, as conventional fine-tuning requires extensive labels, lacks interpretability, and generalizes poorly. While reasoning improves model robustness, existing methods rely on costly expert annotations and rarely address the open-ended, diverse nature of agricultural queries. To address these limitations, we propose \textbf{Agri-R1}, a reasoning-enhanced large model for agriculture. Our framework automates high-quality reasoning data generation via vision-language synthesis and LLM-based filtering, using only 19\% of available samples. Training employs Group Relative Policy Optimization (GRPO) with a novel proposed reward function that integrates domain-specific lexicons and fuzzy matching to assess both correctness and linguistic flexibility in open-ended responses. Evaluated on CDDMBench, our resulting 3B-parameter model achieves performance competitive with 7B- to 13B-parameter baselines, showing a +23.2\% relative gain in disease recognition accuracy, +33.3\% in agricultural knowledge QA, and a +26.10-point improvement in cross-domain generalization over standard fine-tuning. Ablation studies confirm that the synergy between structured reasoning data and GRPO-driven exploration underpins these gains, with benefits scaling as question complexity increases.
翻译:农业病害诊断对视觉语言模型(VLM)构成挑战,因为传统的微调方法需要大量标注数据、缺乏可解释性且泛化能力不足。尽管推理能力能提升模型鲁棒性,但现有方法依赖成本高昂的专家标注,且很少能应对农业领域开放、多样化的查询需求。为解决这些局限,我们提出\textbf{Agri-R1}——一种面向农业的推理增强大模型。该框架通过视觉语言合成与基于大语言模型(LLM)的过滤机制,仅使用19\%的可用样本即可自动生成高质量推理数据。训练过程采用分组相对策略优化(GRPO)及我们新提出的奖励函数,该函数融合领域专用词典与模糊匹配技术,以评估开放域回答的正确性与语言灵活性。在CDDMBench上的评估结果显示,我们最终得到的30亿参数模型取得了与70亿至130亿参数基线模型相当的性能:病害识别准确率相对提升+23.2\%,农业知识问答准确率提升+33.3\%,跨领域泛化能力较标准微调方法提高26.10分。消融实验证实,结构化推理数据与GRPO驱动的探索机制之间的协同作用是性能提升的关键,且该优势随问题复杂度增加而扩大。