In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code.
翻译:本文聚焦于可解释人工智能与安全人工智能领域的文献,旨在理解代码神经模型的投毒机制。为此,我们首先提出一种针对代码特洛伊木马的新型分类体系,并基于方面特征对代码神经模型中的触发器进行创新性分类。其次,重点梳理近期有助于深化理解此类模型如何感知软件代码的研究工作。随后,我们选取若干前沿的投毒策略,这些策略可被用于操纵上述模型。本研究所得见解有望为代码特洛伊木马人工智能领域的未来研究提供助益。