AttackGNN: Red-Teaming GNNs in Hardware Security Using Reinforcement Learning

Machine learning has shown great promise in addressing several critical hardware security problems. In particular, researchers have developed novel graph neural network (GNN)-based techniques for detecting intellectual property (IP) piracy, detecting hardware Trojans (HTs), and reverse engineering circuits, to name a few. These techniques have demonstrated outstanding accuracy and have received much attention in the community. However, since these techniques are used for security applications, it is imperative to evaluate them thoroughly and ensure they are robust and do not compromise the security of integrated circuits. In this work, we propose AttackGNN, the first red-team attack on GNN-based techniques in hardware security. To this end, we devise a novel reinforcement learning (RL) agent that generates adversarial examples, i.e., circuits, against the GNN-based techniques. We overcome three challenges related to effectiveness, scalability, and generality to devise a potent RL agent. We target five GNN-based techniques for four crucial classes of problems in hardware security: IP piracy, detecting/localizing HTs, reverse engineering, and hardware obfuscation. Through our approach, we craft circuits that fool all GNNs considered in this work. For instance, to evade IP piracy detection, we generate adversarial pirated circuits that fool the GNN-based defense into classifying our crafted circuits as not pirated. For attacking HT localization GNN, our attack generates HT-infested circuits that fool the defense on all tested circuits. We obtain a similar 100% success rate against GNNs for all classes of problems.

翻译：机器学习在解决若干关键硬件安全问题上展现出巨大潜力。特别是，研究人员已开发出基于图神经网络（GNN）的新型技术，用于检测知识产权（IP）盗版、硬件木马（HT）以及电路逆向工程等场景。这些技术展现出卓越的准确性，并受到学术界广泛关注。然而，由于这些技术应用于安全领域，必须对其进行全面评估，确保其鲁棒性且不危及集成电路安全性。本文提出AttackGNN，这是首个针对硬件安全领域基于GNN技术的红队攻击方法。为此，我们设计了一种新型强化学习（RL）智能体，能够生成对抗样本（即电路）以对抗GNN技术。我们克服了有效性、可扩展性和通用性三大挑战，开发出性能强大的RL智能体。我们以硬件安全领域的四类关键问题为目标：IP盗版检测、HT检测/定位、逆向工程和硬件混淆，针对五种GNN技术展开攻击。通过该方法，我们成功构造出可欺骗本文所有GNN模型的电路。例如，为规避IP盗版检测，我们生成的对抗性盗版电路能欺骗基于GNN的防御系统，使其将构造电路误判为未盗版。针对HT定位GNN的攻击中，我们的攻击方法生成的含HT电路在所有测试电路上均能欺骗防御系统。对于所有类别的问题，我们针对GNN均实现了100%的同等攻击成功率。