Dehumanization is a mental process that enables the exclusion and ill treatment of a group of people. In this paper, we present two data sets of dehumanizing text, a large, automatically collected corpus and a smaller, manually annotated data set. Both data sets include a combination of political discourse and dialogue from movie subtitles. Our methods give us a broad and varied amount of dehumanization data to work with, enabling further exploratory analysis and automatic classification of dehumanization patterns. Both data sets will be publicly released.
翻译:去人性化是一种导致群体被排斥并遭受不公正对待的心理过程。本文提出了两个去人性化文本数据集:一个大规模自动收集的语料库和一个较小的人工标注数据集。两个数据集均融合了政治话语与电影字幕对话。我们的方法提供了广泛多样的去人性化数据,从而支持进一步的探索性分析及对去人性化模式的自动分类。这两个数据集将公开发布。