Sierra Leone's agriculture operates with almost no data-driven decision support, and no published machine learning study has examined the country's crop yields. We ask whether rice yield can be forecast from data Sierra Leone currently has. Using 25 years of FAOSTAT production data (2000-2024) for nine major crops, we train XGBoost, Gradient Boosting, and Random Forest under a strict anti-leakage protocol with expanding-window walk-forward evaluation across seven held-out years, benchmarked against naive persistence. No model trained on crop statistics alone outperforms persistence. Augmenting with free satellite climate data (CHIRPS rainfall, NASA POWER temperature) reverses this result: a climate-only XGBoost reduces forecast error by one third (RMSE 284 vs 428 kg/ha), a gain that holds for a linear model and is robust to excluding the anomalous 2018 season. Early-season (May-June) rainfall is the dominant predictor, implying seasonal yield risk is observable months before harvest. No model anticipated the 2018 collapse, whose origins were institutional rather than climatic. We translate the findings into policy recommendations for Sierra Leone's Feed Salone Strategy, with a fully open-source pipeline.
翻译:塞拉利昂的农业生产几乎缺乏数据驱动的决策支持,且尚未有机器学习研究探讨该国的作物产量。我们探究能否利用塞拉利昂现有数据预测水稻产量。使用25年(2000-2024年)FAOSTAT九种主要作物的生产数据,在严格的防泄漏协议下训练XGBoost、梯度提升和随机森林模型,通过扩展窗口滚动向前验证法在七个保留年份上进行评估,并以朴素持久性模型为基准。仅基于作物统计训练的模型均未能优于持久性模型。引入免费卫星气候数据(CHIRPS降水、NASA POWER温度)后结果发生逆转:仅含气候特征的XGBoost将预测误差降低三分之一(RMSE从428降至284 kg/ha),该优势在线性模型中仍然保持,且对排除异常2018年季节的结果具有稳健性。早期季节(5-6月)降水量是最主要的预测因子,表明季节产量风险在收获前数月即可观测。所有模型均未能预测2018年产量骤降,其根源在于制度因素而非气候因素。我们将研究结果转化为塞拉利昂"塞拉利昂粮食自给战略"的政策建议,并提供完全开源的完整流程。