We propose an interpretable model to score the bias present in web documents, based only on their textual content. Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article, where one version is more biased than the other. While prior approaches based on absolute bias classification have struggled to obtain a high accuracy for the task, we are able to develop a useful model for scoring bias by learning to perform pairwise comparisons of bias accurately. We show that we can interpret the parameters of the trained model to discover the words most indicative of bias. We also apply our model in three different settings - studying the temporal evolution of bias in Wikipedia articles, comparing news sources based on bias, and scoring bias in law amendments. In each case, we demonstrate that the outputs of the model can be explained and validated, even for the two domains that are outside the training-data domain. We also use the model to compare the general level of bias between domains, where we see that legal texts are the least biased and news media are the most biased, with Wikipedia articles in between. Given its high performance, simplicity, interpretability, and wide applicability, we hope the model will be useful for a large community, including Wikipedia and news editors, political and social scientists, and the general public.
翻译:我们提出一种基于文档文本内容对其偏见程度进行评分的可解释模型。该模型引入了类似布拉德利-特里公理的假设,并基于同一维基百科文章的不同修订版本对进行训练,其中一版比另一版更具偏见性。尽管先前基于绝对偏见分类的方法在该任务中难以获得高准确率,但通过学习精确执行偏见的两两比较,我们成功开发出一种实用的偏见评分模型。研究表明,通过解释训练后的模型参数,可以发现最具偏见指示性的词汇。我们还将模型应用于三个不同场景:研究维基百科文章偏见的时序演变、比较新闻源的偏见程度,以及评估法律修正案的偏见水平。在每个案例中,我们都证明模型的输出是可解释且可验证的,即使对于训练数据领域之外的两个领域也是如此。此外,我们利用该模型比较了不同领域间的整体偏见程度,发现法律文本偏见最低,新闻媒体偏见最高,而维基百科文章介于两者之间。鉴于其高性能、简洁性、可解释性和广泛适用性,我们希望该模型能惠及包括维基百科编辑、新闻编辑、政治与社会科学家以及普通公众在内的广大用户群体。