Although backdoor learning is an active research topic in the NLP domain, the literature lacks studies that systematically categorize and summarize backdoor attacks and defenses. To bridge the gap, we present a comprehensive and unifying study of backdoor learning for NLP by summarizing the literature in a systematic manner. We first present and motivate the importance of backdoor learning for building robust NLP systems. Next, we provide a thorough account of backdoor attack techniques, their applications, defenses against backdoor attacks, and various mitigation techniques to remove backdoor attacks. We then provide a detailed review and analysis of evaluation metrics, benchmark datasets, threat models, and challenges related to backdoor learning in NLP. Ultimately, our work aims to crystallize and contextualize the landscape of existing literature in backdoor learning for the text domain and motivate further research in the field. To this end, we identify troubling gaps in the literature and offer insights and ideas into open challenges and future research directions. Finally, we provide a GitHub repository with a list of backdoor learning papers that will be continuously updated at https://github.com/marwanomar1/Backdoor-Learning-for-NLP.
翻译:尽管后门学习是自然语言处理领域的一个活跃研究课题,但现有文献缺乏对后门攻击与防御的系统性分类和总结。为填补这一空白,我们通过系统性梳理相关文献,呈现了一项关于自然语言处理中后门学习的全面且统一的研究。首先,我们阐述并论证了后门学习在构建鲁棒性自然语言处理系统中的重要意义。接着,我们详细介绍了后门攻击技术、其应用场景、针对后门攻击的防御方法,以及消除后门攻击的各种缓解技术。随后,我们深入回顾并分析了与自然语言处理中后门学习相关的评估指标、基准数据集、威胁模型及挑战。最终,本研究旨在厘清并勾勒文本领域后门学习现有文献的全貌,并推动该领域的进一步研究。为此,我们识别出文献中存在的显著空白,并就开放性问题与未来研究方向提出了见解与思路。最后,我们提供了一个持续更新的GitHub仓库(https://github.com/marwanomar1/Backdoor-Learning-for-NLP),其中收录了后门学习的相关论文列表。