The emergence of Large Language Models (LLMs) has significantly influenced various aspects of software development activities. Despite their benefits, LLMs also pose notable risks, including the potential to generate harmful content and being abused by malicious developers to create malicious code. Several previous studies have focused on the ability of LLMs to resist the generation of harmful content that violates human ethical standards, such as biased or offensive content. However, there is no research evaluating the ability of LLMs to resist malicious code generation. To fill this gap, we propose RMCBench, the first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation. This benchmark employs two scenarios: a text-to-code scenario, where LLMs are prompted with descriptions to generate code, and a code-to-code scenario, where LLMs translate or complete existing malicious code. Based on RMCBench, we conduct an empirical study on 11 representative LLMs to assess their ability to resist malicious code generation. Our findings indicate that current LLMs have a limited ability to resist malicious code generation with an average refusal rate of 40.36% in text-to-code scenario and 11.52% in code-to-code scenario. The average refusal rate of all LLMs in RMCBench is only 28.71%; ChatGPT-4 has a refusal rate of only 35.73%. We also analyze the factors that affect LLMs' ability to resist malicious code generation and provide implications for developers to enhance model robustness.
翻译:大语言模型(LLMs)的出现已显著影响软件开发活动的诸多方面。尽管LLMs带来了益处,它们也构成了显著风险,包括可能生成有害内容以及被恶意开发者滥用以创建恶意代码。先前的一些研究主要关注LLMs抵抗生成违反人类伦理标准的有害内容(如偏见性或攻击性内容)的能力。然而,尚无研究评估LLMs抵抗恶意代码生成的能力。为填补这一空白,我们提出了RMCBench,这是首个包含473个提示的基准,旨在评估LLMs抵抗恶意代码生成的能力。该基准采用两种场景:文本到代码场景,即通过描述提示LLMs生成代码;以及代码到代码场景,即LLMs翻译或补全现有恶意代码。基于RMCBench,我们对11个代表性LLMs进行了实证研究,以评估其抵抗恶意代码生成的能力。我们的研究结果表明,当前LLMs抵抗恶意代码生成的能力有限,在文本到代码场景中的平均拒绝率为40.36%,在代码到代码场景中为11.52%。所有LLMs在RMCBench中的平均拒绝率仅为28.71%;ChatGPT-4的拒绝率仅为35.73%。我们还分析了影响LLMs抵抗恶意代码生成能力的因素,并为开发者增强模型鲁棒性提供了启示。