Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design.
翻译:大语言模型(LLMs)在软件开发领域催生了诸多令人振奋的新能力。然而,这些模型的不透明特性使其难以被推理和审查。这种不透明性引发了潜在的安全风险——攻击者可训练并部署被篡改的模型,从而扰乱目标组织的软件开发流程。本文借助一种新颖的统一触发器分类框架,系统梳理了面向代码的大语言模型中最先进的木马攻击技术,重点关注木马的核心设计要素——触发器。同时,我们致力于为代码大语言模型木马领域的基本概念提供统一定义。最后,我们探讨了研究发现对代码模型如何基于触发器设计进行学习的重要启示。