In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code.
翻译:本文通过研究可解释人工智能与安全人工智能领域的文献,深入理解源码神经模型的投毒机制。为此,我们首先建立针对代码型人工智能木马的新型分类体系,并提出基于触发机制视角的神经代码模型分类方法。随后,重点评述有助于深化理解此类模型如何认知软件代码的前沿成果。接着选取当前可用于操控此类模型的最新投毒策略进行剖析。本研究得出的洞见有望推动代码型人工智能木马领域的未来发展。