Conformal prediction (CP) can convert any model's output into prediction sets guaranteed to include the true label with any user-specified probability. However, same as the model itself, CP is vulnerable to adversarial test examples (evasion) and perturbed calibration data (poisoning). We derive provably robust sets by bounding the worst-case change in conformity scores. Our tighter bounds lead to more efficient sets. We cover both continuous and discrete (sparse) data and our guarantees work both for evasion and poisoning attacks (on both features and labels).
翻译:共形预测(CP)能够将任何模型的输出转换为预测集,保证以任意用户指定的概率包含真实标签。然而,与模型本身类似,CP容易受到对抗性测试样本(规避攻击)和扰动校准数据(投毒攻击)的影响。我们通过限制置信分数的最大变化,推导出可证明鲁棒的预测集。我们提出的更紧致边界能够生成更高效的预测集。我们的方法覆盖了连续数据和离散(稀疏)数据,并且所提出的保证同时适用于规避攻击和投毒攻击(针对特征和标签)。