Data poisoning attacks pose one of the biggest threats to modern AI systems, necessitating robust defenses. While extensive efforts have been made to develop empirical defenses, attackers continue to evolve, creating sophisticated methods to circumvent these measures. To address this, we must move beyond empirical defenses and establish provable certification methods that guarantee robustness. This paper introduces a novel certification approach, BiCert, using Bilinear Mixed Integer Programming (BMIP) to compute sound deterministic bounds that provide such provable robustness. Using BMIP, we compute the reachable set of parameters that could result from training with potentially manipulated data. A key element to make this computation feasible is to relax the reachable parameter set to a convex set between training iterations. At test time, this parameter set allows us to predict all possible outcomes, guaranteeing robustness. BiCert is more precise than previous methods, which rely solely on interval and polyhedral bounds. Crucially, our approach overcomes the fundamental limitation of prior approaches where parameter bounds could only grow, often uncontrollably. We show that BiCert's tighter bounds eliminate a key source of divergence issues, resulting in more stable training and higher certified accuracy.
翻译:数据中毒攻击对现代人工智能系统构成了最严峻的威胁之一,因此亟需构建鲁棒的防御机制。尽管已有大量研究致力于开发经验性防御方法,但攻击手段持续演进,不断产生能够规避现有防御的复杂攻击技术。为解决这一问题,我们必须超越经验性防御,建立可证明的认证方法以确保鲁棒性。本文提出了一种新颖的认证方法BiCert,该方法利用双线性混合整数规划(BMIP)计算可靠的确定性边界,从而提供可证明的鲁棒性保证。通过BMIP,我们计算了在可能被篡改的数据上进行训练时可能产生的参数可达集合。为使该计算可行,关键步骤是在训练迭代之间将参数可达集合松弛为凸集。在测试阶段,该参数集合使我们能够预测所有可能的结果,从而保证鲁棒性。相较于先前仅依赖区间和多面体边界的方法,BiCert具有更高的精确度。尤为重要的是,我们的方法克服了以往方法中参数边界只能增长且往往不可控的根本性局限。我们证明,BiCert更紧致的边界消除了导致发散问题的关键因素,从而实现了更稳定的训练过程和更高的认证精度。