Maize, a crucial crop globally cultivated across vast regions, especially in sub-Saharan Africa, Asia, and Latin America, occupies 197 million hectares as of 2021. Various statistical and machine learning models, including mixed-effect models, random coefficients models, random forests, and deep learning architectures, have been devised to predict maize yield. These models consider factors such as genotype, environment, genotype-environment interaction, and field management. However, the existing models often fall short of fully exploiting the complex network of causal relationships among these factors and the hierarchical structure inherent in agronomic data. This study introduces an innovative approach integrating random effects into Bayesian networks (BNs), leveraging their capacity to model causal and probabilistic relationships through directed acyclic graphs. Rooted in the linear mixed-effects models framework and tailored for hierarchical data, this novel approach demonstrates enhanced BN learning. Application to a real-world agronomic trial produces a model with improved interpretability, unveiling new causal connections. Notably, the proposed method significantly reduces the error rate in maize yield prediction from 28% to 17%. These results advocate for the preference of BNs in constructing practical decision support tools for hierarchical agronomic data, facilitating causal inference.
翻译:玉米是全球范围内广泛种植的重要作物,尤其在撒哈拉以南非洲、亚洲和拉丁美洲地区,截至2021年种植面积达1.97亿公顷。目前已有多种统计与机器学习模型(包括混合效应模型、随机系数模型、随机森林和深度学习架构)用于预测玉米产量。这些模型综合考虑了基因型、环境、基因型-环境互作及田间管理等因素。然而,现有模型往往未能充分利用这些因素间复杂的因果关系网络以及农学数据固有的层次结构。本研究提出一种创新方法,将随机效应整合到贝叶斯网络中,利用其通过有向无环图建模因果与概率关系的能力。该方法基于线性混合效应模型框架并针对层次数据定制,显著提升了贝叶斯网络的学习效果。应用于真实农学试验后,所得模型具有更强的可解释性,并揭示了新的因果关联。值得注意的是,所提方法将玉米产量预测错误率从28%显著降低至17%。这些结果证明了贝叶斯网络在构建面向层次农学数据的实用决策支持工具、促进因果推断方面的优势。