A Hybrid Deep Learning-based Approach for Optimal Genotype by Environment Selection

Precise crop yield prediction is essential for improving agricultural practices and ensuring crop resilience in varying climates. Integrating weather data across the growing season, especially for different crop varieties, is crucial for understanding their adaptability in the face of climate change. In the MLCAS2021 Crop Yield Prediction Challenge, we utilized a dataset comprising 93,028 training records to forecast yields for 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). This dataset included details on 5,838 distinct genotypes and daily weather data for a 214-day growing season, enabling comprehensive analysis. As one of the winning teams, we developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model, combining CNN and fully-connected networks, and the CNN-LSTM-DNN model, with an added LSTM layer for weather variables. Leveraging the Generalized Ensemble Method (GEM), we determined optimal model weights, resulting in superior performance compared to baseline models. The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) when evaluated on test data. We applied the CNN-DNN model to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables. Our data-driven approach is valuable for scenarios with limited testing years. Additionally, a feature importance analysis using RMSE change highlighted the significance of location, MG, year, and genotype, along with the importance of weather variables MDNI and AP.

翻译：精准的作物产量预测对于改善农业实践和确保作物在不同气候条件下的适应性至关重要。整合整个生长季节的天气数据，尤其是针对不同作物品种，是理解它们应对气候变化适应能力的关键。在MLCAS2021作物产量预测挑战赛中，我们使用了包含93,028条训练记录的数据集，预测了10,337条测试记录的产量，覆盖了美国28个州和加拿大省份的159个地点，时间跨度为13年（2003-2015年）。该数据集包含了5,838种不同基因型的详细信息以及214天生长期的每日天气数据，从而实现了全面分析。作为获胜队伍之一，我们开发了两种新型卷积神经网络架构：CNN-DNN模型（结合了CNN和全连接网络）和CNN-LSTM-DNN模型（为天气变量增加了LSTM层）。利用广义集成方法，我们确定了最优模型权重，与基线模型相比表现出更优性能。在测试数据评估中，GEM模型实现了更低的均方根误差（降低5.55%至39.88%）、更低的平均绝对误差（降低5.34%至43.76%）以及更高的相关系数（提高1.1%至10.79%）。我们应用CNN-DNN模型识别不同地点和天气条件下的表现最佳基因型，从而辅助基于天气变量的基因型选择。我们的数据驱动方法对于测试年份有限的情景具有重要价值。此外，基于RMSE变化的特征重要性分析突显了地点、成熟度组、年份和基因型的重要性，同时天气变量MDNI和AP也至关重要。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日