Transformer-based neural networks have demonstrated remarkable performance in natural language processing tasks such as sentiment analysis. Nevertheless, the issue of ensuring the dependability of these complicated architectures through comprehensive testing is still open. This paper presents a collection of coverage criteria specifically designed to assess test suites created for transformer-based sentiment analysis networks. Our approach utilizes input space partitioning, a black-box method, by considering emotionally relevant linguistic features such as verbs, adjectives, adverbs, and nouns. In order to effectively produce test cases that encompass a wide range of emotional elements, we utilize the k-projection coverage metric. This metric minimizes the complexity of the problem by examining subsets of k features at the same time, hence reducing dimensionality. Large language models are employed to generate sentences that display specific combinations of emotional features. The findings from experiments obtained from a sentiment analysis dataset illustrate that our criteria and generated tests have led to an average increase of 16\% in test coverage. In addition, there is a corresponding average decrease of 6.5\% in model accuracy, showing the ability to identify vulnerabilities. Our work provides a foundation for improving the dependability of transformer-based sentiment analysis systems through comprehensive test evaluation.
翻译:基于Transformer的神经网络在情感分析等自然语言处理任务中展现出卓越性能。然而,如何通过全面测试确保这些复杂架构的可靠性仍是一个待解决的问题。本文提出了一套专门用于评估基于Transformer的情感分析网络测试集的覆盖准则。我们的方法采用黑盒测试中的输入空间划分策略,通过考量动词、形容词、副词和名词等情感相关语言特征来实现。为有效生成涵盖广泛情感要素的测试用例,我们采用k-投影覆盖度量标准。该标准通过同时考察k个特征的子集来降低问题复杂度,从而实现维度约简。我们利用大语言模型生成具有特定情感特征组合的句子。基于情感分析数据集的实验结果表明:我们的准则与生成的测试用例使测试覆盖率平均提升16%,同时模型准确率相应平均下降6.5%,这证明了该方法识别系统脆弱性的能力。本工作为通过全面测试评估提升基于Transformer的情感分析系统可靠性奠定了理论基础。