The successful application of large pre-trained models such as BERT in natural language processing has attracted more attention from researchers. Since the BERT typically acts as an end-to-end black box, classification systems based on it usually have difficulty in interpretation and low robustness. This paper proposes a visual interpretation-based self-improving classification model with a combination of virtual adversarial training (VAT) and BERT models to address the above problems. Specifically, a fine-tuned BERT model is used as a classifier to classify the sentiment of the text. Then, the predicted sentiment classification labels are used as part of the input of another BERT for spam classification via a semi-supervised training manner using VAT. Additionally, visualization techniques, including visualizing the importance of words and normalizing the attention head matrix, are employed to analyze the relevance of each component to classification accuracy. Moreover, brand-new features will be found in the visual analysis, and classification performance will be improved. Experimental results on Twitter's tweet dataset demonstrate the effectiveness of the proposed model on the classification task. Furthermore, the ablation study results illustrate the effect of different components of the proposed model on the classification results.
翻译:大型预训练模型(如BERT)在自然语言处理中的成功应用吸引了研究者的广泛关注。由于BERT通常作为端到端的黑箱模型运行,基于它的分类系统往往存在解释性差和鲁棒性低的问题。本文提出一种基于视觉解释的自改进分类模型,结合虚拟对抗训练(VAT)与BERT模型来解决上述问题。具体而言,首先使用微调后的BERT模型作为分类器对文本情感进行分类。随后,通过半监督训练方式,将预测的情感分类标签作为另一个BERT模型的部分输入,并利用VAT进行垃圾信息分类。此外,采用可视化技术(包括单词重要性可视化和注意力头矩阵归一化)分析各组件对分类准确率的贡献。通过视觉分析可以发现全新特征,从而提升分类性能。在Twitter推文数据集上的实验结果证明了该模型在分类任务中的有效性。同时,消融研究结果说明了模型各组件对分类结果的影响。