The beginning of the rainy season and the occurrence of dry spells in West Africa is notoriously difficult to predict, however these are the key indicators farmers use to decide when to plant crops, having a major influence on their overall yield. While many studies have shown correlations between global sea surface temperatures and characteristics of the West African monsoon season, there are few that effectively implement this information into machine learning (ML) prediction models. In this study we investigated the best ways to define our target variables, onset and dry spell, and produced methods to predict them for upcoming seasons using sea surface temperature teleconnections. Defining our target variables required the use of a combination of two well known definitions of onset. We then applied custom statistical techniques -- like total variation regularization and predictor selection -- to the two models we constructed, the first being a linear model and the other an adaptive-threshold logistic regression model. We found mixed results for onset prediction, with spatial verification showing signs of significant skill, while temporal verification showed little to none. For dry spell though, we found significant accuracy through the analysis of multiple binary classification metrics. These models overcome some limitations that current approaches have, such as being computationally intensive and needing bias correction. We also introduce this study as a framework to use ML methods for targeted prediction of certain weather phenomenon using climatologically relevant variables. As we apply ML techniques to more problems, we see clear benefits for fields like meteorology and lay out a few new directions for further research.
翻译:西非雨季的开始与干旱期的发生 notoriously 难以预测,然而这些正是农民决定作物播种时间的关键指标,对其整体产量具有重大影响。尽管许多研究已表明全球海表温度与西非季风季节特征之间存在相关性,但鲜有研究能有效地将这些信息应用于机器学习(ML)预测模型。在本研究中,我们探讨了定义目标变量(起始与干旱期)的最佳方式,并开发了利用海表温度遥相关预测未来季节这些现象的方法。定义目标变量需要结合使用两种广为人知的季风起始定义。随后,我们将定制化的统计技术——如全变分正则化与预测因子筛选——应用于构建的两个模型:第一个为线性模型,另一个为自适应阈值逻辑回归模型。对于季风起始预测,我们得到了混合的结果:空间验证显示出显著的预测技巧,而时间验证则显示技巧甚微或没有。然而对于干旱期,通过分析多个二分类指标,我们发现了显著的预测准确性。这些模型克服了现有方法的一些局限性,例如计算密集和需要偏差校正。本研究还提出了一个框架,即利用气候学相关变量,通过机器学习方法对特定天气现象进行针对性预测。随着我们将机器学习技术应用于更多问题,我们看到了其对气象学等领域的明显益处,并为进一步研究提出了若干新方向。