LEAP: Efficient and Automated Test Method for NLP Software

The widespread adoption of DNNs in NLP software has highlighted the need for robustness. Researchers proposed various automatic testing techniques for adversarial test cases. However, existing methods suffer from two limitations: weak error-discovering capabilities, with success rates ranging from 0% to 24.6% for BERT-based NLP software, and time inefficiency, taking 177.8s to 205.28s per test case, making them challenging for time-constrained scenarios. To address these issues, this paper proposes LEAP, an automated test method that uses LEvy flight-based Adaptive Particle swarm optimization integrated with textual features to generate adversarial test cases. Specifically, we adopt Levy flight for population initialization to increase the diversity of generated test cases. We also design an inertial weight adaptive update operator to improve the efficiency of LEAP's global optimization of high-dimensional text examples and a mutation operator based on the greedy strategy to reduce the search time. We conducted a series of experiments to validate LEAP's ability to test NLP software and found that the average success rate of LEAP in generating adversarial test cases is 79.1%, which is 6.1% higher than the next best approach (PSOattack). While ensuring high success rates, LEAP significantly reduces time overhead by up to 147.6s compared to other heuristic-based methods. Additionally, the experimental results demonstrate that LEAP can generate more transferable test cases and significantly enhance the robustness of DNN-based systems.

翻译：深度神经网络在自然语言处理软件中的广泛应用凸显了鲁棒性测试的必要性。研究者提出了多种针对对抗样本的自动化测试技术。然而，现有方法存在两大局限：对基于BERT的NLP软件，其错误发现能力较弱（成功率仅为0%至24.6%），且测试效率低下（每个测试用例耗时177.8至205.28秒），难以满足时间敏感场景的需求。针对上述问题，本文提出LEAP——一种结合文本特征的莱维飞行自适应粒子群优化自动化测试方法，用于生成对抗测试用例。具体而言，我们采用莱维飞行进行种群初始化以增强生成测试用例的多样性；设计惯性权重自适应更新算子提升LEAP对高维文本示例的全局优化效率；并引入基于贪心策略的变异算子缩短搜索时间。通过系列实验验证LEAP对NLP软件的测试能力，结果表明：LEAP生成对抗测试用例的平均成功率达79.1%，较次优方法（PSOattack）提升6.1%；在保证高成功率的同时，相比其他启发式方法，LEAP将时间开销最高降低147.6秒。此外，实验证明LEAP能生成更具迁移性的测试用例，并显著增强基于深度神经网络系统的鲁棒性。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日