Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on NLP tasks, but their black-box nature, which leads to a lack of interpretability, has been a major concern. My dissertation focuses on developing intrinsically interpretable models when using LMs as encoders while maintaining their superior performance via prototypical networks. I initiated my research by investigating enhancements in performance for interpretable models of sarcasm detection. My proposed approach focuses on capturing sentiment incongruity to enhance accuracy while offering instance-based explanations for the classification decisions. Later, I developed a novel white-box multi-head graph attention-based prototype network designed to explain the decisions of text classification models without sacrificing the accuracy of the original black-box LMs. In addition, I am working on extending the attention-based prototype network with contrastive learning to redesign an interpretable graph neural network, aiming to enhance both the interpretability and performance of the model in document classification.
翻译:预训练的基于Transformer的语言模型以其在自然语言处理任务中显著提升性能而闻名,但其黑盒特性导致的解释性缺失一直是主要关切点。本论文聚焦于在使用语言模型作为编码器的同时,通过原型网络开发本质可解释的模型,并保持其卓越性能。研究始于探索讽刺检测可解释模型的性能增强方法,所提出的方法侧重于捕捉情感不一致性以提高准确性,同时为分类决策提供基于实例的解释。随后,开发了一种新颖的基于多头图注意力的白盒原型网络,旨在解释文本分类模型的决策,且不牺牲原始黑盒语言模型的准确性。此外,当前正致力于通过对比学习扩展基于注意力的原型网络,以重新设计可解释的图神经网络,旨在增强文档分类任务中模型的解释性与性能。