Owing to the exceptional performance of Large Language Models (LLMs) in Natural Language Processing (NLP) tasks, LLM-based NLP software has rapidly gained traction across various domains, such as financial analysis and content moderation. However, these applications frequently exhibit robustness deficiencies, where slight perturbations in input (prompt+example) may lead to erroneous outputs. Current robustness testing methods face two main limitations: (1) low testing effectiveness, limiting the applicability of LLM-based software in safety-critical scenarios, and (2) insufficient naturalness of test cases, reducing the practical value of testing outcomes. To address these issues, this paper proposes ABFS, a straightforward yet effective automated testing method that, for the first time, treats the input prompts and examples as a unified whole for robustness testing. Specifically, ABFS formulates the testing process as a combinatorial optimization problem, employing Best-First Search to identify successful test cases within the perturbation space and designing a novel Adaptive control strategy to enhance test case naturalness. We evaluate the robustness testing performance of ABFS on three datasets across five threat models. On Llama2-13b, the traditional StressTest achieves only a 13.273% success rate, while ABFS attains a success rate of 98.064%, supporting a more comprehensive robustness assessment before software deployment. Compared to baseline methods, ABFS introduces fewer modifications to the original input and consistently generates test cases with superior naturalness. Furthermore, test cases generated by ABFS exhibit stronger transferability and higher testing efficiency, significantly reducing testing costs.
翻译:由于大语言模型(LLM)在自然语言处理(NLP)任务中表现卓越,基于LLM的NLP软件已在金融分析、内容审核等多个领域迅速获得广泛应用。然而,这类应用常表现出鲁棒性不足的问题,即输入(提示词+示例)的轻微扰动可能导致输出错误。现有鲁棒性测试方法主要面临两大局限:(1)测试有效性低,限制了基于LLM的软件在安全关键场景中的适用性;(2)测试用例的自然性不足,降低了测试结果的实际价值。为应对这些问题,本文提出ABFS——一种简洁高效的自动化测试方法,首次将输入提示词与示例视为整体进行鲁棒性测试。具体而言,ABFS将测试过程建模为组合优化问题,采用最佳优先搜索在扰动空间中定位成功测试用例,并设计新颖的自适应控制策略以提升测试用例的自然性。我们在三个数据集上针对五种威胁模型评估了ABFS的鲁棒性测试性能。在Llama2-13b模型上,传统StressTest方法仅获得13.273%的成功率,而ABFS达到98.064%的成功率,为软件部署前提供更全面的鲁棒性评估。与基线方法相比,ABFS对原始输入的修改更少,且能持续生成具有更优自然性的测试用例。此外,ABFS生成的测试用例展现出更强的可迁移性与更高的测试效率,显著降低了测试成本。