Bit-flip attacks (BFAs) have attracted substantial attention recently, in which an adversary could tamper with a small number of model parameter bits to break the integrity of DNNs. To mitigate such threats, a batch of defense methods are proposed, focusing on the untargeted scenarios. Unfortunately, they either require extra trustworthy applications or make models more vulnerable to targeted BFAs. Countermeasures against targeted BFAs, stealthier and more purposeful by nature, are far from well established. In this work, we propose Aegis, a novel defense method to mitigate targeted BFAs. The core observation is that existing targeted attacks focus on flipping critical bits in certain important layers. Thus, we design a dynamic-exit mechanism to attach extra internal classifiers (ICs) to hidden layers. This mechanism enables input samples to early-exit from different layers, which effectively upsets the adversary's attack plans. Moreover, the dynamic-exit mechanism randomly selects ICs for predictions during each inference to significantly increase the attack cost for the adaptive attacks where all defense mechanisms are transparent to the adversary. We further propose a robustness training strategy to adapt ICs to the attack scenarios by simulating BFAs during the IC training phase, to increase model robustness. Extensive evaluations over four well-known datasets and two popular DNN structures reveal that Aegis could effectively mitigate different state-of-the-art targeted attacks, reducing attack success rate by 5-10$\times$, significantly outperforming existing defense methods.
翻译:位翻转攻击(BFAs)近期备受关注,攻击者可通过篡改少量模型参数位来破坏深度神经网络的完整性。为应对此类威胁,现有防御方法主要针对非目标场景,但存在依赖额外可信应用或使模型更易遭受目标BFA攻击的问题。针对隐蔽性更强且更具目的性的目标BFA攻击,现有防御措施远未成熟。本文提出Aegis,一种全新的目标BFA攻击防御方法。核心发现是现有目标攻击主要针对关键层的临界位进行翻转,为此我们设计了动态退出机制,在隐藏层附加内部分类器(ICs),使输入样本可从不同层提前退出,有效打乱攻击者的攻击计划。该机制在每次推理时随机选择IC进行预测,显著提升自适应攻击(即防御机制对攻击者完全透明)的攻击成本。我们进一步提出鲁棒性训练策略,通过在IC训练阶段模拟BFA攻击来增强模型鲁棒性。在四个基准数据集和两种主流DNN结构上的广泛评估表明,Aegis可有效缓解多种最先进的目标攻击,将攻击成功率降低5-10倍,显著优于现有防御方法。