Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such as conformal prediction have received much attention in recent years. In sequential settings, where data are observed/generated in a streaming fashion, traditional conformal methods do not provide any guarantee without fixing the sample size. More importantly, traditional conformal methods cannot cope with sequentially updated predictions. As such, we develop an extension of the conformal prediction and related probably approximately correct (PAC) prediction frameworks to sequential settings where the number of data points is not fixed in advance. The resulting prediction sets are anytime-valid in that their expected coverage is at the required level at any time chosen by the analyst even if this choice depends on the data. We present theoretical guarantees for our proposed methods and demonstrate their validity and utility on simulated and real datasets.
翻译:鉴于机器学习算法正日益被部署于高风险决策支持中,近年来围绕这些黑盒模型的不确定性量化方法(如共形预测)受到了广泛关注。在数据以流式方式观测/生成的序列化场景中,传统共形方法若未固定样本量则无法提供任何保证。更重要的是,传统共形方法无法适应序列更新的预测需求。为此,我们将共形预测及相关可能近似正确(PAC)预测框架扩展至数据点数量非预先固定的序列化场景。所得预测集具有任意时间有效性:即使分析者的时间选择依赖于数据,其期望覆盖水平在任意选定时刻均能满足预设要求。我们为所提方法提供了理论保证,并通过模拟与真实数据集验证了其有效性和实用性。