AttentionBreaker: Adaptive Evolutionary Optimization for Unmasking Vulnerabilities in LLMs through Bit-Flip Attacks

Large Language Models (LLMs) have revolutionized natural language processing (NLP), excelling in tasks like text generation and summarization. However, their increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs). BFAs, enabled by fault injection methods such as Rowhammer, target model parameters in memory, compromising both integrity and performance. Identifying critical parameters for BFAs in the vast parameter space of LLMs poses significant challenges. While prior research suggests transformer-based architectures are inherently more robust to BFAs compared to traditional deep neural networks, we challenge this assumption. For the first time, we demonstrate that as few as three bit-flips can cause catastrophic performance degradation in an LLM with billions of parameters. Current BFA techniques are inadequate for exploiting this vulnerability due to the difficulty of efficiently identifying critical parameters within the immense parameter space. To address this, we propose AttentionBreaker, a novel framework tailored for LLMs that enables efficient traversal of the parameter space to identify critical parameters. Additionally, we introduce GenBFA, an evolutionary optimization strategy designed to refine the search further, isolating the most critical bits for an efficient and effective attack. Empirical results reveal the profound vulnerability of LLMs to AttentionBreaker. For example, merely three bit-flips (4.129 x 10^-9% of total parameters) in the LLaMA3-8B-Instruct 8-bit quantized (W8) model result in a complete performance collapse: accuracy on MMLU tasks drops from 67.3% to 0%, and Wikitext perplexity skyrockets from 12.6 to 4.72 x 10^5. These findings underscore the effectiveness of AttentionBreaker in uncovering and exploiting critical vulnerabilities within LLM architectures.

翻译：大语言模型（LLMs）已彻底变革自然语言处理（NLP）领域，在文本生成与摘要等任务中表现卓越。然而，其在关键任务应用中的日益普及引发了人们对基于硬件威胁的担忧，尤其是比特翻转攻击（BFAs）。BFAs通过如Rowhammer等故障注入方法实现，针对内存中的模型参数进行攻击，损害模型的完整性与性能。在LLMs庞大的参数空间中识别BFAs的关键参数面临重大挑战。尽管先前研究表明，基于Transformer的架构相较于传统深度神经网络对BFAs具有固有更强的鲁棒性，我们对此假设提出质疑。我们首次证明，仅需三次比特翻转即可导致拥有数十亿参数的LLM发生灾难性性能退化。由于难以在巨大的参数空间中高效识别关键参数，现有的BFA技术无法有效利用此漏洞。为此，我们提出AttentionBreaker，一个专为LLMs设计的新型框架，能够高效遍历参数空间以识别关键参数。此外，我们引入GenBFA，一种旨在进一步优化搜索的进化优化策略，以隔离最关键比特位，实现高效且有效的攻击。实证结果揭示了LLMs对AttentionBreaker的深刻脆弱性。例如，在LLaMA3-8B-Instruct 8位量化（W8）模型中，仅三次比特翻转（占总参数的4.129 x 10^-9%）便导致性能完全崩溃：MMLU任务准确率从67.3%降至0%，Wikitext困惑度从12.6飙升至4.72 x 10^5。这些发现凸显了AttentionBreaker在揭示和利用LLM架构关键漏洞方面的有效性。