Training neural networks with high certified accuracy against adversarial examples remains an open problem despite significant efforts. While certification methods can effectively leverage tight convex relaxations for bound computation, in training, these methods perform worse than looser relaxations. Prior work hypothesized that this is caused by the discontinuity and perturbation sensitivity of the loss surface induced by these tighter relaxations. In this work, we show theoretically that Gaussian Loss Smoothing can alleviate both issues. We confirm this empirically by proposing a certified training method combining PGPE, an algorithm computing gradients of a smoothed loss, with different convex relaxations. When using this training method, we observe that tighter bounds indeed lead to strictly better networks. While scaling PGPE training remains challenging due to high computational cost, we show that by using a not theoretically sound, yet much cheaper smoothing approximation, we obtain better certified accuracies than state-of-the-art methods when training on the same network architecture. Our results clearly demonstrate the promise of Gaussian Loss Smoothing for training certifiably robust neural networks.
翻译:尽管付出了巨大努力,训练具有高认证精度以抵御对抗样本的神经网络仍然是一个悬而未决的问题。虽然认证方法可以有效地利用紧致凸松弛进行边界计算,但在训练过程中,这些方法的表现却不如较宽松的松弛方法。先前的研究假设,这是由于这些更紧致的松弛所导致的损失表面的不连续性和扰动敏感性造成的。在本工作中,我们从理论上证明,高斯损失平滑可以缓解这两个问题。我们通过提出一种结合PGPE(一种计算平滑损失梯度的算法)与不同凸松弛的认证训练方法,从经验上证实了这一点。当使用这种训练方法时,我们观察到更紧致的边界确实能带来严格更优的网络。尽管由于高昂的计算成本,扩展PGPE训练仍然具有挑战性,但我们证明,通过使用一种在理论上不严格但成本低得多的平滑近似方法,我们在相同的网络架构上训练时,获得了优于现有最先进方法的认证精度。我们的结果清晰地展示了高斯损失平滑在训练可认证鲁棒神经网络方面的潜力。