This study examines whether machine learning (ML) models can outperform the naive random walk benchmark in forecasting the monthly USD/CAD exchange rate. Using daily data from the Bank of Canada spanning January 2017 to May 2026, resampled into 113 monthly observations, five ML models are evaluated: linear regression, random forest, gradient boosting, XGBoost, and AdaBoost. These models are benchmarked against the naive random walk model and exponential smoothing with Holt-Winters seasonality (ETS). All models are evaluated using an expanding-window framework to maintain strict out-of-sample integrity, and forecast-accuracy differences are assessed using the Diebold-Mariano (DM) test. Structural break detection identifies four significant breakpoints in the series, corresponding to the escalation of the US-China trade war in 2018, the COVID-19 economic recovery in 2020, the peak of the Bank of Canada rate-hiking cycle in 2022, and the start of the Bank of Canada rate-cutting cycle in 2024. SHAP, or Shapley Additive Explanations, analysis is applied to interpret the drivers of the best-performing ML model. The results show that the naive random walk model remains a formidable benchmark. Linear regression is the only model that statistically outperforms the naive random walk model, with a DM statistic of 3.0585 and a p value of 0.0071, whereas the ML ensemble models show only marginal differences. Random Forest with an expanding-window framework achieves the lowest MAPE of 1.17 percent among all models except the random walk. SHAP analysis confirms that short-term lags, particularly lag1 and lag2, and recent rolling means dominate predictions, consistent with the near-random-walk behavior of exchange rates.
翻译:本研究探讨机器学习(ML)模型能否在月度美元兑加元汇率预测中超越朴素随机游走基准。研究采用加拿大银行2017年1月至2026年5月的日度数据,重采样为113个月度观测值,评估了五种机器学习模型:线性回归、随机森林、梯度提升、XGBoost和AdaBoost。这些模型以朴素随机游走模型和含Holt-Winters季节性的指数平滑(ETS)为基准。所有模型均采用扩展窗口框架进行评估,以严格保证样本外预测的完整性,并通过Diebold-Mariano(DM)检验评估预测精度差异。结构突变检测识别出序列中四个显著断点,分别对应2018年中美贸易战升级、2020年新冠疫情经济复苏、2022年加拿大银行加息周期峰值以及2024年加拿大银行降息周期启动。SHAP(Shapley Additive Explanations)分析被用于解释最优ML模型的驱动因素。结果显示,朴素随机游走模型仍是强劲基准。线性回归是唯一在统计上显著优于朴素随机游走模型的模型(DM统计量=3.0585,p值=0.0071),而集成ML模型仅表现出边际差异。基于扩展窗口的随机森林在除随机游走外的所有模型中实现了最低平均绝对百分比误差(1.17%)。SHAP分析证实短期滞后项(特别是滞后1期与滞后2期)及近期滚动均值主导预测,这与汇率近乎随机游走的特性一致。