Large language models (LLMs) have demonstrated remarkable potential with code generation/completion tasks for hardware design. In fact, LLM-based hardware description language (HDL) code generation has enabled the industry to realize complex designs more quickly, reducing the time and effort required in the development cycle. However, the increased reliance on such automation introduces critical security risks. Notably, given that LLMs have to be trained on vast datasets of codes that are typically sourced from publicly available repositories (often without thorough validation), LLMs are susceptible to so-called data poisoning or backdoor attacks. Here, attackers inject malicious code for the training data, which can be carried over into the HDL code generated by LLMs. This threat vector can compromise the security and integrity of entire hardware systems. In this work, we propose RTL-Breaker, a novel backdoor attack framework on LLM-based HDL code generation. RTL-Breaker provides an in-depth analysis for essential aspects of this novel problem: 1) various trigger mechanisms versus their effectiveness for inserting malicious modifications, and 2) side-effects by backdoor attacks on code generation in general, i.e., impact on code quality. RTL-Breaker emphasizes the urgent need for more robust measures to safeguard against such attacks. Toward that end, we open-source our framework and all data.
翻译:大型语言模型(LLM)在硬件设计的代码生成/补全任务中展现出显著潜力。事实上,基于LLM的硬件描述语言(HDL)代码生成已使业界能够更快地实现复杂设计,从而减少开发周期所需的时间和精力。然而,对此类自动化依赖的增加引入了严重的安全风险。值得注意的是,由于LLM必须在通常来源于公开可用代码库(通常未经充分验证)的海量代码数据集上进行训练,它们容易受到所谓的数据投毒或后门攻击。在此类攻击中,攻击者向训练数据中注入恶意代码,这些代码可能被延续到LLM生成的HDL代码中。这种威胁载体可能危及整个硬件系统的安全性与完整性。在本工作中,我们提出了RTL-Breaker,一种针对基于LLM的HDL代码生成的新型后门攻击框架。RTL-Breaker对这一新问题的关键方面进行了深入分析:1)多种触发机制及其对插入恶意修改的有效性;2)后门攻击对代码生成产生的副作用,即对代码质量的影响。RTL-Breaker强调了采取更鲁棒防护措施以抵御此类攻击的迫切需求。为此,我们开源了本框架及所有相关数据。