A Comparative Study of Deep Reinforcement Learning for Crop Production Management

Crop production management is essential for optimizing yield and minimizing a field's environmental impact to crop fields, yet it remains challenging due to the complex and stochastic processes involved. Recently, researchers have turned to machine learning to address these complexities. Specifically, reinforcement learning (RL), a cutting-edge approach designed to learn optimal decision-making strategies through trial and error in dynamic environments, has emerged as a promising tool for developing adaptive crop management policies. RL models aim to optimize long-term rewards by continuously interacting with the environment, making them well-suited for tackling the uncertainties and variability inherent in crop management. Studies have shown that RL can generate crop management policies that compete with, and even outperform, expert-designed policies within simulation-based crop models. In the gym-DSSAT crop model environment, one of the most widely used simulators for crop management, proximal policy optimization (PPO) and deep Q-networks (DQN) have shown promising results. However, these methods have not yet been systematically evaluated under identical conditions. In this study, we evaluated PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment. To ensure a fair comparison, we used consistent default parameters, identical reward functions, and the same environment settings. Our results indicate that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task. This comparative analysis provides critical insights into the strengths and limitations of each approach, advancing the development of more effective RL-based crop management strategies.

翻译：作物生产管理对于优化产量和最小化农田环境影响至关重要，但由于涉及复杂且随机的过程，这仍然是一项具有挑战性的任务。最近，研究人员转向机器学习以应对这些复杂性。具体而言，强化学习（RL）作为一种前沿方法，旨在通过在动态环境中进行试错来学习最优决策策略，已成为开发适应性作物管理策略的有力工具。RL模型旨在通过与环境的持续交互来优化长期回报，这使其非常适合应对作物管理固有的不确定性和变异性。研究表明，在基于模拟的作物模型中，RL生成的作物管理策略可以与专家设计的策略相竞争，甚至表现更优。在gym-DSSAT作物模型环境（最广泛使用的作物管理模拟器之一）中，近端策略优化（PPO）和深度Q网络（DQN）已显示出有希望的结果。然而，这些方法尚未在相同条件下进行系统评估。在本研究中，我们在gym-DSSAT环境提供的三个不同RL任务（施肥、灌溉和混合管理）中，评估了PPO和DQN相对于静态基线策略的性能。为确保公平比较，我们使用了一致的默认参数、相同的奖励函数和相同的环境设置。我们的结果表明，PPO在施肥和灌溉任务中优于DQN，而DQN在混合管理任务中表现更佳。这项比较分析为每种方法的优势和局限性提供了关键见解，推动了更有效的基于RL的作物管理策略的发展。