On Extracting Specialized Code Abilities from Large Language Models: A Feasibility Study

Recent advances in large language models (LLMs) significantly boost their usage in software engineering. However, training a well-performing LLM demands a substantial workforce for data collection and annotation. Moreover, training datasets may be proprietary or partially open, and the process often requires a costly GPU cluster. The intellectual property value of commercial LLMs makes them attractive targets for imitation attacks, but creating an imitation model with comparable parameters still incurs high costs. This motivates us to explore a practical and novel direction: slicing commercial black-box LLMs using medium-sized backbone models. In this paper, we explore the feasibility of launching imitation attacks on LLMs to extract their specialized code abilities, such as"code synthesis" and "code translation." We systematically investigate the effectiveness of launching code ability extraction attacks under different code-related tasks with multiple query schemes, including zero-shot, in-context, and Chain-of-Thought. We also design response checks to refine the outputs, leading to an effective imitation training process. Our results show promising outcomes, demonstrating that with a reasonable number of queries, attackers can train a medium-sized backbone model to replicate specialized code behaviors similar to the target LLMs. We summarize our findings and insights to help researchers better understand the threats posed by imitation attacks, including revealing a practical attack surface for generating adversarial code examples against LLMs.

翻译：近期大型语言模型（LLM）的进展显著提升了其在软件工程中的应用。然而，训练一个高性能LLM需要大量人力进行数据收集与标注。此外，训练数据集可能具有专有性或仅部分开放，且训练过程通常需要昂贵的GPU集群。商业LLM的知识产权价值使其成为模仿攻击的诱人目标，但创建参数规模相当的模仿模型仍需高昂成本。这促使我们探索一个实用且新颖的方向：使用中等规模的骨干模型对商业黑盒LLM进行切片。本文探讨了对LLM发起模仿攻击以提取其专业代码能力（如"代码合成"和"代码翻译"）的可行性。我们系统研究了在多种查询方案（包括零样本、上下文学习和思维链）下，针对不同代码相关任务发起代码能力提取攻击的有效性。我们还设计了响应检查机制来优化输出，从而形成有效的模仿训练流程。实验结果表明，攻击者可通过合理数量的查询，训练中等规模骨干模型复制目标LLM的专业代码行为，展现出令人瞩目的成效。我们总结发现与见解，以帮助研究人员更好地理解模仿攻击带来的威胁，包括揭示生成对抗性代码示例以攻击LLM的实用攻击面。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日