Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations.
翻译:许多用于深度学习的先进对抗训练方法通过利用对抗损失的上界来提供抵御对抗攻击的安全保证。然而,这些方法依赖凸松弛来传播中间层的下界和上界,这会影响输出层边界的紧致性。我们提出一种新的对抗训练方法,通过最小化基于网络整体展开的对抗损失上界(而非逐层独立边界)来实现。该上界借助鲁棒优化的先进工具构建,具有闭式解,可通过反向传播进行高效训练。我们基于该方法推导出两种新方法:第一种方法(近似鲁棒上界,aRUB)利用网络的一阶近似及线性鲁棒优化的基本工具,得到易于实现的对抗损失经验上界;第二种方法(鲁棒上界,RUB)则计算对抗损失的可证明上界。在表格数据和图像数据的多种数据集上,我们验证了该方法的效果——RUB在大扰动下显著优于现有先进方法的鲁棒性,而aRUB在小扰动下与现有先进方法性能相当。