Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with correct actions, thereby affecting the accuracy of quantile estimation. To address these limitations, we propose Bandit Class-specific Conformal Prediction (BCCP), offering coverage guarantees on a class-specific granularity. Using an unbiased estimation of an estimand involving the true label, BCCP trains the model and makes set-valued inferences through stochastic gradient descent. Our approach overcomes the challenges of sparsely labeled data in each iteration and generalizes the reliability and applicability of conformal prediction to online decision-making environments.
翻译:共形预测是一种无分布方法,它封装给定的机器学习模型并返回一组包含真实标签且具有规定覆盖率的合理标签集。在实践中,经验覆盖率的实现高度依赖于训练阶段模型拟合和校准阶段分位数估计中数据的完整标签信息。这种依赖性在具有老虎机反馈的在线学习环境中构成了挑战,因为学习者只能访问行动的正确性(即拉动的臂),而无法获得真实标签的完整信息。特别是,当拉动的臂不正确时,学习者只知道拉动的不是真实类别标签,但不知道哪个标签是真实的。此外,老虎机反馈进一步导致用于校准的标记数据集变小,仅限于正确行动实例,从而影响分位数估计的准确性。为解决这些限制,我们提出类别特定共形预测老虎机(BCCP),在类别特定粒度上提供覆盖保证。通过使用涉及真实标签的估计量的无偏估计,BCCP通过随机梯度下降训练模型并进行集值推断。我们的方法克服了每次迭代中标签数据稀疏的挑战,并将共形预测的可靠性和适用性推广到在线决策环境中。