In democratic countries the latent ideology landscape is foundational to individual and collective political action; conversely, fringe ideology drives Ideologically Motivated Violent Extremism (IMVE). Therefore, quantifying ideology is a crucial first step to an ocean of downstream problems, such as; understanding and countering IMVE, detecting and intervening in disinformation campaigns, and broader empirical opinion dynamics modeling. However, online ideology detection faces two significant hindrances. Firstly, the ground truth that forms the basis for ideology detection is often prohibitively labor-intensive for practitioners to collect, requires access to domain experts and is specific to the context of its collection (i.e., time, location, and platform). Secondly, to circumvent this expense researchers generate ground truth via other ideological signals (i.e. hashtags, politicians, etc.), however, the bias this introduces has not been quantified and often this still requires expert intervention. In this work, we present an end-to-end ideology detection pipeline applicable to large-scale datasets. We construct context-agnostic and automatic ideological signals from widely available media slant data; show the derived pipeline is performant, compared to pipelines of common ideology signals and SOTA baselines; employ the pipeline for left-right ideology, and (the more concerning) detection of extreme ideologies; generate psychosocial profiles of the inferred ideological groups; and, generate insights into their morality and preoccupations.
翻译:在民主国家中,潜在意识形态图景是个人与集体政治行动的基石;反之,边缘意识形态会驱动意识形态动机暴力极端主义(IMVE)。因此,量化意识形态是解决众多下游问题的关键第一步,例如:理解并应对IMVE、检测与干预虚假信息活动,以及更广泛的实证舆论动态建模。然而,在线意识形态检测面临两大障碍。首先,构成意识形态检测基础的客观事实(ground truth)通常需要实践者投入极高的人力成本进行收集,需借助领域专家且具有严格的时空与平台特异性。其次,为规避此成本,研究者通过其他意识形态信号(如话题标签、政治人物等)生成客观事实,但这类方法引入的偏差尚未被量化,且往往仍需专家介入。本研究提出一种适用于大规模数据集的端到端意识形态检测流水线。我们从广泛可用的媒体倾向数据中构建与语境无关的自动化意识形态信号;通过与通用意识形态信号流水线及最先进基线方法的对比,验证该流水线的有效性;将其应用于左右派意识形态检测及(更令人担忧的)极端意识形态识别;生成推演所得意识形态群体的社会心理画像,并揭示其道德准则与关注焦点。