Recent work has shown evidence of 'Clever Hans' behavior in high-performance neural translationese classifiers, where BERT-based classifiers capitalize on spurious correlations, in particular topic information, between data and target classification labels, rather than genuine translationese signals. Translationese signals are subtle (especially for professional translation) and compete with many other signals in the data such as genre, style, author, and, in particular, topic. This raises the general question of how much of the performance of a classifier is really due to spurious correlations in the data versus the signals actually targeted for by the classifier, especially for subtle target signals and in challenging (low resource) data settings. We focus on topic-based spurious correlation and approach the question from two directions: (i) where we have no knowledge about spurious topic information and its distribution in the data, (ii) where we have some indication about the nature of spurious topic correlations. For (i) we develop a measure from first principles capturing alignment of unsupervised topics with target classification labels as an indication of spurious topic information in the data. We show that our measure is the same as purity in clustering and propose a 'topic floor' (as in a 'noise floor') for classification. For (ii) we investigate masking of known spurious topic carriers in classification. Both (i) and (ii) contribute to quantifying and (ii) to mitigating spurious correlations.
翻译:近期研究显示,高性能神经翻译腔分类器存在“聪明汉斯”行为证据——基于BERT的分类器利用数据与目标分类标签间的虚假相关性(尤其是主题信息)而非真正的翻译腔信号进行分类。翻译腔信号本身非常细微(尤其在专业翻译中),并且与数据中的体裁、风格、作者信息(特别是主题信息)等多种信号存在竞争关系。这引发了一个普遍性问题:分类器的性能究竟在多大程度上源于数据中的虚假相关性,而非其真正要捕捉的目标信号(尤其针对微弱目标信号和低资源数据场景)。我们聚焦基于主题的虚假相关性,从两个方向展开研究:(i)对数据中虚假主题信息及其分布完全未知的场景;(ii)对虚假主题相关性特征有部分线索的场景。针对(i),我们从第一性原理出发开发了度量方法,通过捕获无监督主题与目标分类标签的对齐程度来指示数据中的虚假主题信息。实验证明该度量等价于聚类纯度,并据此提出分类任务的“主题本底”(类比“噪声本底”)。针对(ii),我们研究了分类过程中对已知虚假主题载体的掩蔽策略。综合(i)与(ii)的方法分别实现了虚假相关性的量化与缓解。