Modern hardware designs have grown increasingly efficient and complex. However, they are often susceptible to Common Weakness Enumerations (CWEs). This paper is focused on the formal verification of CWEs in a dataset of hardware designs written in SystemVerilog from Regenerative Artificial Intelligence (AI) powered by Large Language Models (LLMs). We applied formal verification to categorize each hardware design as vulnerable or CWE-free. This dataset was generated by 4 different LLMs and features a unique set of designs for each of the 10 CWEs we target in our paper. We have associated the identified vulnerabilities with CWE numbers for a dataset of 60,000 generated SystemVerilog Register Transfer Level (RTL) code. It was also found that most LLMs are not aware of any hardware CWEs; hence they are usually not considered when generating the hardware code. Our study reveals that approximately 60% of the hardware designs generated by LLMs are prone to CWEs, posing potential safety and security risks. The dataset could be ideal for training LLMs and Machine Learning (ML) algorithms to abstain from generating CWE-prone hardware designs.
翻译:现代硬件设计已变得日益高效与复杂,然而其常常易受到常见弱点枚举(CWE)的影响。本文聚焦于由大语言模型(LLM)驱动的再生人工智能(AI)所生成的、以SystemVerilog编写的硬件设计数据集中的CWE形式验证工作。我们应用形式验证方法,将每个硬件设计归类为脆弱或免于CWE。该数据集由4种不同的LLM生成,并针对我们论文中关注的10种CWE,每种CWE均包含一组独特的设计。我们已将识别出的漏洞与CWE编号相关联,构建了一个包含60,000个生成的SystemVerilog寄存器传输级(RTL)代码的数据集。同时发现,大多数LLM对硬件CWE缺乏认知,因此在生成硬件代码时通常未将其纳入考量。我们的研究揭示,约60%由LLM生成的硬件设计存在CWE隐患,潜藏安全风险。该数据集或可理想地用于训练LLM及机器学习(ML)算法,使其避免生成易受CWE影响的硬件设计。