As machine learning (ML) gains widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when ML systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction has emerged as a promising approach to uncertainty and risk quantification, but prior variants' validity guarantees have assumed some form of ``quasi-exchangeability'' on the data distribution, thereby excluding many types of sequential shifts. In this paper we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones, although it is exceedingly impractical to compute in the most general case. For practical applications, we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of ML-agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.
翻译:随着机器学习(ML)的广泛应用,从业者日益寻求量化与控制这些系统所承担风险的方法。当机器学习系统具备自主收集数据的能力时(例如在黑盒优化和主动学习中),这一挑战尤为突出,因为其行为会引发数据分布的序列反馈循环偏移。保形预测已成为一种前景广阔的不确定性与风险量化方法,但先前变体的有效性保证均假设数据分布具有某种形式的“拟交换性”,从而排除了多种类型的序列偏移。本文证明,保形预测在理论上可扩展至\textit{任意}联合数据分布,而不仅限于可交换或拟可交换分布,尽管在最一般情形下的计算极不切实际。针对实际应用,我们概述了为任意数据分布推导特定保形算法的流程,并运用该流程推导出一系列适用于机器学习智能体诱导协变量偏移的易处理算法。我们在合成黑盒优化与主动学习任务中对所提算法进行了实证评估。