As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality checks. To address this, we present SafeTune, a framework designed to harden LLM-based RTL generation against poisoning, specifically focusing on hardware Trojan (HT) insertion. SafeTune integrates two core components: (i) a Graph Neural Network (GNN) that models structural properties to identify anomalous circuitry patterns during fine-tuning, and (ii) a semantic verification module using text embeddings and an XGBoost classifier to assess prompt security. By coupling structural and semantic knowledge, SafeTune effectively filters poisoned inputs without sacrificing legitimate data. Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning without requiring modifications to the underlying model architecture.
翻译:随着大型语言模型(LLM)越来越多地被微调用于RTL代码生成等硬件任务,高质量数据集的稀缺常常导致使用快速组装或生成的训练数据。这些数据集通常缺乏安全验证,极易遭受数据投毒攻击。此类投毒可能导致模型生成语法正确但存在安全隐患的硬件模块,从而绕过标准功能检查。为解决此问题,我们提出SafeTune——一种旨在强化基于LLM的RTL生成、抵御投毒(尤其针对硬件木马(HT)插入)的框架。SafeTune整合了两大核心组件:(i) 图神经网络(GNN),通过建模结构属性以识别微调期间的异常电路模式;(ii) 语义验证模块,利用文本嵌入和XGBoost分类器评估提示安全性。通过耦合结构与语义知识,SafeTune有效过滤投毒输入且不牺牲合法数据。实验结果表明,SafeTune无需修改底层模型架构,即可显著提升LLM微调的鲁棒性与可靠性。