Large Language Models (LLMs) have been garnering significant attention of AI researchers, especially following the widespread popularity of ChatGPT. However, due to LLMs' intricate architecture and vast parameters, several concerns and challenges regarding their quality assurance require to be addressed. In this paper, a fine-tuned GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis. Then, the quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments as the wrongly-annotated data, and developing surprise adequacy (SA)-based techniques to detect these abnormal data. Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented. Results were thoroughly discussed from the perspective of AI quality assurance to present the quality analysis of an LLM model on generated adversarial textual data and the effectiveness of using SA on anomaly detection in data quality assurance.
翻译:大语言模型(LLMs)在人工智能研究领域持续引发广泛关注,尤其在ChatGPT获得普遍流行之后。然而,由于大语言模型复杂的架构和庞大的参数规模,其质量保障方面仍存在诸多有待解决的挑战与问题。本文首先构建并研究了一个经过微调的基于GPT的情感分析模型,作为人工智能质量分析的基准参考。随后,针对数据充分性开展质量分析:采用基于内容的方法生成合理的对抗性评论文本作为标注错误数据,并开发基于意外充分性(SA)的技术来检测这些异常数据。基于亚马逊电商平台的评论文本数据及微调后的GPT模型开展了实验研究,从人工智能质量保障视角对实验结果进行了深入讨论,揭示了大语言模型在生成对抗性文本数据方面的质量分析结果,以及SA技术用于数据质量保障中异常检测的有效性。