Large language models (LLMs) perform very well in several natural language processing tasks but raise explainability challenges. In this paper, we examine the effect of random elements in the training of LLMs on the explainability of their predictions. We do so on a task of opinionated journalistic text classification in French. Using a fine-tuned CamemBERT model and an explanation method based on relevance propagation, we find that training with different random seeds produces models with similar accuracy but variable explanations. We therefore claim that characterizing the explanations' statistical distribution is needed for the explainability of LLMs. We then explore a simpler model based on textual features which offers stable explanations but is less accurate. Hence, this simpler model corresponds to a different tradeoff between accuracy and explainability. We show that it can be improved by inserting features derived from CamemBERT's explanations. We finally discuss new research directions suggested by our results, in particular regarding the origin of the sensitivity observed in the training randomness.
翻译:大语言模型(LLMs)在多项自然语言处理任务中表现优异,但也带来了可解释性方面的挑战。本文研究了LLMs训练过程中的随机因素对其预测可解释性的影响。我们以法语观点性新闻文本分类任务为实验场景,通过微调CamemBERT模型并采用基于相关性传播的解释方法,发现使用不同随机种子训练得到的模型虽然具有相近的准确率,但其生成的解释却存在显著差异。因此我们认为,要真正实现LLMs的可解释性,必须对其解释的统计分布特性进行刻画。随后我们探究了一种基于文本特征的简化模型,该模型能提供稳定的解释但准确率较低,这体现了准确性与可解释性之间的另一种权衡关系。实验表明,通过引入从CamemBERT解释中衍生的特征,可以提升该简化模型的性能。最后,我们基于实验结果讨论了新的研究方向,特别是关于训练随机性导致解释敏感性的根源问题。