Sequential decision-making is central to sustainable agricultural management and precision agriculture, where resource inputs must be optimized under uncertainty and over time. However, such decisions must often be made with limited observations, whereas classical bandit and reinforcement learning approaches typically rely on either linear or black-box reward models that may misrepresent domain knowledge or require large amounts of data. We propose a family of \emph{nonlinear, model-based bandit algorithms} that embed domain-specific response curves directly into the exploration-exploitation loop. By coupling (i) principled uncertainty quantification with (ii) closed-form or rapidly computable profit optima, these algorithms achieve sublinear regret and near-optimal sample complexity while preserving interpretability. Theoretical analysis establishes regret and sample complexity bounds, and extensive simulations emulating real-world fertilizer-rate decisions show consistent improvements over both linear and nonparametric baselines (such as linear UCB and $k$-NN UCB) in the low-sample regime, under both well-specified and shape-compatible misspecified models. Because our approach leverages mechanistic insight rather than large data volumes, it is especially suited to resource-constrained settings, supporting sustainable, inclusive, and transparent sequential decision-making across agriculture, environmental management, and allied applications.
翻译:序贯决策是可持续农业管理和精准农业的核心,需要在不确定性和时间维度下优化资源投入。然而,此类决策通常只能在有限观测条件下进行,而经典的赌博机与强化学习方法通常依赖于线性或黑盒奖励模型,这些模型可能无法准确表征领域知识或需要大量数据。我们提出了一类**非线性、基于模型的赌博机算法**,将特定领域的响应曲线直接嵌入探索-利用循环中。通过将(i)基于原理的不确定性量化与(ii)闭式解或可快速计算的利润最优解相结合,这些算法在保持可解释性的同时,实现了次线性遗憾和接近最优的样本复杂度。理论分析确立了遗憾与样本复杂度的界,模拟真实世界施肥量决策的大量仿真实验表明,在小样本条件下,无论对于设定正确模型还是形状兼容的误设模型,该方法均一致优于线性和非参数基线方法(如线性UCB和$k$-NN UCB)。由于我们的方法利用的是机理洞察而非海量数据,因此特别适用于资源受限的场景,可支持农业、环境管理及相关应用领域实现可持续、包容且透明的序贯决策。