Most real-world applications that employ deep neural networks (DNNs) quantize them to low precision to reduce the compute needs. We present a method to improve the robustness of quantized DNNs to white-box adversarial attacks. We first tackle the limitation of deterministic quantization to fixed ``bins'' by introducing a differentiable Stochastic Quantizer (SQ). We explore the hypothesis that different quantizations may collectively be more robust than each quantized DNN. We formulate a training objective to encourage different quantized DNNs to learn different representations of the input image. The training objective captures diversity and accuracy via mutual information between ensemble members. Through experimentation, we demonstrate substantial improvement in robustness against $L_\infty$ attacks even if the attacker is allowed to backpropagate through SQ (e.g., > 50\% accuracy to PGD(5/255) on CIFAR10 without adversarial training), compared to vanilla DNNs as well as existing ensembles of quantized DNNs. We extend the method to detect attacks and generate robustness profiles in the adversarial information plane (AIP), towards a unified analysis of different threat models by correlating the MI and accuracy.
翻译:大多数采用深度神经网络(DNN)的实际应用会将其量化为低精度以降低计算需求。我们提出了一种方法,用于提升量化DNN对白盒对抗攻击的鲁棒性。首先,通过引入可微分的随机量化器(SQ)来克服确定性量化对固定“区间”的限制,并探究不同量化方式整体可能比单个量化DNN更鲁棒的假设。我们制定了一个训练目标,鼓励不同的量化DNN学习输入图像的不同表示,该目标通过集成成员间的互信息来捕捉多样性与准确性。实验证明,即使攻击者被允许通过SQ进行反向传播,我们的方法在抵抗$L_\infty$攻击方面也表现出显著提升(例如,在CIFAR10上无需对抗训练即可实现PGD(5/255)>50%的准确率),优于普通DNN及现有量化DNN集成方法。我们还将该方法扩展至检测攻击,并在对抗信息平面(AIP)中生成鲁棒性剖面,通过关联互信息与准确性实现对不同威胁模型的统一分析。