CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage

Large Language Models (LLMs) have achieved remarkable success in generative tasks, including register-transfer level (RTL) hardware synthesis. However, their tendency to memorize training data poses critical risks when proprietary or security-sensitive designs are unintentionally exposed during inference. While prior work has examined memorization in natural language, RTL introduces unique challenges: In RTL, structurally different implementations (e.g., behavioral vs. gate-level descriptions) can realize the same hardware, leading to intellectual property (IP) leakage (full or partial) even without verbatim overlap. Conversely, even small syntactic variations (e.g., operator precedence or blocking vs. non-blocking assignments) can drastically alter circuit behavior, making correctness preservation especially challenging. In this work, we systematically study memorization in RTL code generation and propose CircuitGuard, a defense strategy that balances leakage reduction with correctness preservation. CircuitGuard (1) introduces a novel RTL-aware similarity metric that captures both structural and functional equivalence beyond surface-level overlap, and (2) develops an activation-level steering method that identifies and attenuates transformer components most responsible for memorization. Our empirical evaluation demonstrates that CircuitGuard identifies (and isolates) 275 memorization-critical features across layers 18-28 of Llama 3.1-8B model, achieving up to 80% reduction in semantic similarity to proprietary patterns while maintaining generation quality. CircuitGuard further shows 78-85% cross-domain transfer effectiveness, enabling robust memorization mitigation across circuit categories without retraining.

翻译：大型语言模型（LLM）在生成任务中取得了显著成功，包括寄存器传输级（RTL）硬件综合。然而，当专有或安全敏感的设计在推理过程中被无意暴露时，其记忆训练数据的倾向会带来严重风险。现有研究已探讨自然语言中的记忆现象，但RTL领域存在独特挑战：在RTL中，结构不同的实现（例如行为级与门级描述）可能实现相同的硬件功能，导致即使没有字面重叠也会产生知识产权（IP）泄露（完全或部分）。相反，微小的语法差异（例如运算符优先级或阻塞与非阻塞赋值）可能彻底改变电路行为，使得正确性保持尤为困难。本研究系统性地探究了RTL代码生成中的记忆现象，并提出CircuitGuard——一种在泄露抑制与正确性保持间取得平衡的防御策略。CircuitGuard（1）引入了一种新颖的RTL感知相似度度量方法，能够捕捉超越表层重叠的结构与功能等价性；（2）开发了激活级导向方法，可识别并抑制对记忆现象贡献最大的Transformer组件。实验评估表明，CircuitGuard在Llama 3.1-8B模型的第18-28层中识别（并隔离）了275个记忆关键特征，在保持生成质量的同时，将专有模式的语义相似度降低达80%。CircuitGuard进一步展现出78-85%的跨领域迁移效能，无需重新训练即可实现跨电路类别的鲁棒记忆缓解。