The recent developments of complex deep learning models have led to unprecedented ability to accurately predict across multiple data representation types. Conformal prediction for uncertainty quantification of these models has risen in popularity, providing adaptive, statistically-valid prediction sets. For classification tasks, conformal methods have typically focused on utilizing logit scores. For pre-trained models, however, this can result in inefficient, overly conservative set sizes when not calibrated towards the target task. We propose DANCE, a doubly locally adaptive nearest-neighbor based conformal algorithm combining two novel nonconformity scores directly using the data's embedded representation. DANCE first fits a task-adaptive kernel regression model from the embedding layer before using the learned kernel space to produce the final prediction sets for uncertainty quantification. We test against state-of-the-art local, task-adapted and zero-shot conformal baselines, demonstrating DANCE's superior blend of set size efficiency and robustness across various datasets.
翻译:近年来,复杂深度学习模型的发展使得跨多种数据表示类型的精确预测能力达到了前所未有的水平。针对这些模型的不确定性量化,保形预测方法日益流行,能够提供具有统计保证的自适应预测集。在分类任务中,保形方法通常侧重于利用逻辑分数。然而对于预训练模型,若未针对目标任务进行校准,则可能导致预测集效率低下且过于保守。本文提出DANCE——一种基于双重局部自适应最近邻的保形算法,该算法通过数据的嵌入表示直接结合两种新颖的非保形分数。DANCE首先从嵌入层拟合任务自适应核回归模型,随后利用学习到的核空间生成用于不确定性量化的最终预测集。我们在多个数据集上对比了最先进的局部自适应、任务自适应及零样本保形基线方法,实验表明DANCE在预测集规模效率与鲁棒性之间实现了更优的平衡。