Recent advances in large language models (LLMs) significantly boost their usage in software engineering. However, training a well-performing LLM demands a substantial workforce for data collection and annotation. Moreover, training datasets may be proprietary or partially open, and the process often requires a costly GPU cluster. The intellectual property value of commercial LLMs makes them attractive targets for imitation attacks, but creating an imitation model with comparable parameters still incurs high costs. This motivates us to explore a practical and novel direction: slicing commercial black-box LLMs using medium-sized backbone models. In this paper, we explore the feasibility of launching imitation attacks on LLMs to extract their specialized code abilities, such as"code synthesis" and "code translation." We systematically investigate the effectiveness of launching code ability extraction attacks under different code-related tasks with multiple query schemes, including zero-shot, in-context, and Chain-of-Thought. We also design response checks to refine the outputs, leading to an effective imitation training process. Our results show promising outcomes, demonstrating that with a reasonable number of queries, attackers can train a medium-sized backbone model to replicate specialized code behaviors similar to the target LLMs. We summarize our findings and insights to help researchers better understand the threats posed by imitation attacks, including revealing a practical attack surface for generating adversarial code examples against LLMs.
翻译:近期大型语言模型(LLM)的进展显著提升了其在软件工程中的应用。然而,训练一个高性能LLM需要大量人力进行数据收集与标注。此外,训练数据集可能具有专有性或仅部分开放,且训练过程通常需要昂贵的GPU集群。商业LLM的知识产权价值使其成为模仿攻击的诱人目标,但创建参数规模相当的模仿模型仍需高昂成本。这促使我们探索一个实用且新颖的方向:使用中等规模的骨干模型对商业黑盒LLM进行切片。本文探讨了对LLM发起模仿攻击以提取其专业代码能力(如"代码合成"和"代码翻译")的可行性。我们系统研究了在多种查询方案(包括零样本、上下文学习和思维链)下,针对不同代码相关任务发起代码能力提取攻击的有效性。我们还设计了响应检查机制来优化输出,从而形成有效的模仿训练流程。实验结果表明,攻击者可通过合理数量的查询,训练中等规模骨干模型复制目标LLM的专业代码行为,展现出令人瞩目的成效。我们总结发现与见解,以帮助研究人员更好地理解模仿攻击带来的威胁,包括揭示生成对抗性代码示例以攻击LLM的实用攻击面。