We explore the application of machine learning algorithms to predict the suitability of Russet potato clones for advancement in breeding trials. Leveraging data from manually collected trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature engineering, and imputation to address missing values. We focus on several key metrics such as accuracy, F1-score, and Matthews correlation coefficient (MCC) for model evaluation. The top-performing models, namely the multi-layer perceptron (MLPC), histogram-based gradient boosting classifier (HGBC), and a support vector machine (SVC), demonstrate consistent and significant results. Variable selection further enhances model performance and identifies influential features in predicting trial outcomes. The findings emphasize the potential of machine learning in streamlining the selection process for potato varieties, offering benefits such as increased efficiency, substantial cost savings, and judicious resource utilization. Our study contributes insights into precision agriculture and showcases the relevance of advanced technologies for informed decision-making in breeding programs.
翻译:我们探究了机器学习算法在预测赤褐色马铃薯克隆品种在育种试验中晋级适宜性方面的应用。基于俄勒冈州手工采集的试验数据,我们评估了多种前沿二分类模型的潜力。通过对数据集进行包含预处理、特征工程及缺失值插补在内的全面分析,我们聚焦于准确率、F1分数和马修斯相关系数(MCC)等关键指标进行模型评估。表现最优的模型——多层感知器(MLPC)、基于直方图的梯度提升分类器(HGBC)及支持向量机(SVC)——均展现出稳定且显著的效果。变量选择进一步提升了模型性能,并识别出对预测试验成效具有影响力的关键特征。研究结果凸显了机器学习在简化马铃薯品种筛选流程中的潜力,可带来效率提升、成本显著节约及资源合理利用等优势。本研究为精准农业提供了新见解,并展示了先进技术在育种项目决策支持中的现实意义。