Language models (LMs) may appear insensitive to word order changes in natural language understanding (NLU) tasks. In this paper, we propose that linguistic redundancy can explain this phenomenon, whereby word order and other linguistic cues such as case markers provide overlapping and thus redundant information. Our hypothesis is that models exhibit insensitivity to word order when the order provides redundant information, and the degree of insensitivity varies across tasks. We quantify how informative word order is using mutual information (MI) between unscrambled and scrambled sentences. Our results show the effect that the less informative word order is, the more consistent the model's predictions are between unscrambled and scrambled sentences. We also find that the effect varies across tasks: for some tasks, like SST-2, LMs' prediction is almost always consistent with the original one even if the Pointwise-MI (PMI) changes, while for others, like RTE, the consistency is near random when the PMI gets lower, i.e., word order is really important.
翻译:语言模型(LMs)在自然语言理解(NLU)任务中可能表现出对词序变化不敏感。本文提出,语言冗余性可以解释这一现象:词序与其他语言线索(如格标记)提供重叠且冗余的信息。我们假设,当词序提供冗余信息时,模型会表现出对词序不敏感,且这种不敏感程度因任务而异。我们利用未打乱句子与打乱句子之间的互信息(MI)量化词序的信息量。结果表明,词序信息量越少,模型在未打乱与打乱句子间的预测一致性越高。我们还发现这种效应因任务而异:对于某些任务(如SST-2),即使逐点互信息(PMI)发生变化,LMs的预测几乎始终与原始结果一致;而对于其他任务(如RTE),当PMI降低时,一致性接近随机水平,即词序至关重要。