We propose a new conformal prediction method for time-series data with a guaranteed asymptotic conditional coverage rate, Sequential Conformalized Density Regions (SCDR), which is flexible enough to produce both prediction intervals and disconnected prediction sets, signifying the emergence of bifurcations. Our approach uses existing estimated conditional highest density predictive regions to form initial predictive regions. We then use a quantile random forest conformal adjustment to provide guaranteed coverage while adaptively changing to take the non-exchangeable nature of time-series data into account. We show that the proposed method achieves the guaranteed coverage rate asymptotically under certain regularity conditions. In particular, the method is doubly robust -- it works if the predictive density model is correctly specified and/or if the scores follow a nonlinear autoregressive model with the correct order specified. Simulations reveal that the proposed method outperforms existing methods in terms of empirical coverage rates and set sizes. We illustrate the method using two real datasets, the Old Faithful geyser dataset and the Australian electricity usage dataset. Prediction sets formed using SCDR for the geyser eruption durations include both single intervals and unions of two intervals, whereas existing methods produce wider, less informative, single-interval prediction sets.
翻译:我们针对时序数据提出了一种新的共形预测方法,该方法具有渐近条件覆盖率的保证,称为序列共形密度区域(SCDR),它足够灵活,可以生成预测区间和断开预测集,标志着分支现象的出现。我们的方法利用现有估计的条件最高密度预测区域来形成初始预测区域。然后,我们使用分位数随机森林共形调整来提供有保证的覆盖率,同时自适应地考虑时序数据的非交换性质。我们证明了在某些正则条件下,所提出的方法渐近地达到了有保证的覆盖率。特别地,该方法是双重鲁棒的——当预测密度模型正确指定和/或当得分遵循正确阶数指定的非线性自回归模型时,它都能有效工作。仿真结果表明,所提出的方法在经验覆盖率和集合大小方面优于现有方法。我们使用两个真实数据集(老忠实间歇泉数据集和澳大利亚用电量数据集)说明了该方法。使用SCDR形成的间歇泉喷发持续时间预测集包括单个区间和两个区间的并集,而现有方法则产生更宽、信息量更少的单区间预测集。