As machine learning (ML) gains widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when ML systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction has emerged as a promising approach to uncertainty and risk quantification, but existing variants either fail to accommodate sequences of data-dependent shifts, or do not fully exploit the fact that agent-induced shift is under our control. In this work we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones, although it is exceedingly impractical to compute in the most general case. For practical applications, we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.
翻译:随着机器学习(ML)的广泛应用,从业者越来越需要量化并控制这些系统带来的风险。当机器学习系统自主收集数据(如黑盒优化和主动学习)时,这一问题尤为突出——其行为会导致数据分布产生连续的反馈循环偏移。共形预测已成为不确定性和风险量化的有效方法,但现有变体要么无法适应数据依赖的序列偏移,要么未能充分利用智能体诱导偏移的可控性。本研究理论上证明了共形预测可扩展至\textit{任意}联合数据分布(不仅限于可交换或准可交换分布),尽管在最一般情形下其计算极度困难。针对实际应用,我们提出了一套为任意数据分布推导特定共形算法的流程,并利用该流程为一系列智能体诱导的协变量偏移推导出可计算的算法。最终在合成黑盒优化和主动学习任务上对提出算法进行了实验评估。