Traditionally, inserting realistic Hardware Trojans (HTs) into complex hardware systems has been a time-consuming and manual process, requiring comprehensive knowledge of the design and navigating intricate Hardware Description Language (HDL) codebases. Machine Learning (ML)-based approaches have attempted to automate this process but often face challenges such as the need for extensive training data, long learning times, and limited generalizability across diverse hardware design landscapes. This paper addresses these challenges by proposing GHOST (Generator for Hardware-Oriented Stealthy Trojans), an automated attack framework that leverages Large Language Models (LLMs) for rapid HT generation and insertion. Our study evaluates three state-of-the-art LLMs - GPT-4, Gemini-1.5-pro, and Llama-3-70B - across three hardware designs: SRAM, AES, and UART. According to our evaluations, GPT-4 demonstrates superior performance, with 88.88% of HT insertion attempts successfully generating functional and synthesizable HTs. This study also highlights the security risks posed by LLM-generated HTs, showing that 100% of GHOST-generated synthesizable HTs evaded detection by an ML-based HT detection tool. These results underscore the urgent need for advanced detection and prevention mechanisms in hardware security to address the emerging threat of LLM-generated HTs. The GHOST HT benchmarks are available at: https://github.com/HSTRG1/GHOSTbenchmarks.git
翻译:传统上,将真实的硬件木马插入复杂硬件系统是一个耗时且需要人工操作的过程,要求设计者具备全面的设计知识并需深入理解复杂的硬件描述语言代码库。基于机器学习的方法曾尝试自动化此过程,但常面临诸多挑战,如需要大量训练数据、学习时间长,以及在不同硬件设计场景中泛化能力有限。本文通过提出GHOST(面向硬件的隐蔽木马生成器)来解决这些挑战,这是一个利用大语言模型进行快速硬件木马生成与插入的自动化攻击框架。本研究评估了三种先进的大语言模型——GPT-4、Gemini-1.5-pro和Llama-3-70B——在三种硬件设计(SRAM、AES和UART)上的表现。根据评估结果,GPT-4展现出卓越的性能,其88.88%的硬件木马插入尝试成功生成了功能正常且可综合的木马。本研究同时凸显了大语言模型生成硬件木马所带来的安全风险:实验表明,100%由GHOST生成的可综合硬件木马均成功逃逸了一种基于机器学习的硬件木马检测工具的检测。这些结果强调了在硬件安全领域迫切需要发展先进的检测与防御机制,以应对大语言模型生成硬件木马这一新兴威胁。GHOST硬件木马基准测试集已发布于:https://github.com/HSTRG1/GHOSTbenchmarks.git