The integration of large language models (LLMs) into electronic design automation (EDA) workflows has introduced powerful capabilities for RTL generation, verification, and design optimization, but also raises critical security concerns. Malicious LLM outputs in this domain pose hardware-level threats, including hardware Trojan insertion, side-channel leakage, and intellectual property theft, that are irreversible once fabricated into silicon. Such requests often exploit semantic disguise, embedding adversarial intent within legitimate engineering language that existing safety mechanisms, trained on general-purpose hazards, fail to detect. No benchmark exists to evaluate LLM vulnerability to such domain-specific threats. We present the HarmChip benchmark to assess jailbreak susceptibility in hardware security, spanning 16 hardware security domains, 120 threats, and 360 prompts at two difficulty levels. Evaluation of state-of-the-art LLMs reveals an alignment paradox: They refuse legitimate security queries while complying with semantically disguised attacks, exposing blind spots in safety guardrails and underscoring the need for domain-aware safety alignment.
翻译:大语言模型(LLM)在电子设计自动化(EDA)流程中的集成引入了强大的RTL生成、验证和设计优化能力,但也引发了关键的安全问题。该领域中的恶意LLM输出会造成硬件级威胁,包括硬件木马注入、侧信道泄露和知识产权盗窃,这些威胁一旦被制成芯片便不可逆转。此类请求常常利用语义伪装,将对抗性意图嵌入到合法的工程语言中,使得基于通用危害训练的现有安全机制无法检测。目前尚无基准可评估LLM面临此类领域特异性威胁时的脆弱性。我们提出了HarmChip基准,用于评估硬件安全中的越狱易感性,其涵盖16个硬件安全领域、120种威胁和360个提示,分为两个难度级别。对最先进LLM的评估揭示了一种对齐悖论:它们会拒绝合法的安全查询,却遵从经过语义伪装的攻击,从而暴露出安全防护栏的盲点,并强调了领域感知安全对齐的必要性。