Researchers and practitioners interested in computational politics rely on automatic content analysis tools to make sense of the large amount of political texts available on the Web. Such tools should provide objective and subjective aspects at different granularity levels to make the analyses useful in practice. Existing methods produce interesting insights for objective aspects, but are limited for subjective ones, are often limited to national contexts, and have limited explainability. We introduce a text analysis framework which integrates both perspectives and provides a fine-grained processing of subjective aspects. Information retrieval techniques and knowledge bases complement powerful natural language processing components to allow a flexible aggregation of results at different granularity levels. Importantly, the proposed bottom-up approach facilitates the explainability of the obtained results. We illustrate its functioning with insights on news outlets, political orientations, topics, individual entities, and demographic segments. The approach is instantiated on a large corpus of French news, but is designed to work seamlessly for other languages and countries.
翻译:对计算政治学感兴趣的研究者和从业者依赖自动内容分析工具来理解网络上大量可用的政治文本。此类工具应在不同粒度层次上提供客观与主观层面的分析,以使分析结果在实践中具有应用价值。现有方法在客观层面能产生有价值的洞见,但在主观层面存在局限,通常仅限于国家语境,且可解释性不足。我们提出一个整合双重视角并实现主观层面细粒度处理的文本分析框架。信息检索技术与知识库结合强大的自然语言处理组件,支持在不同粒度层次上灵活聚合分析结果。值得注意的是,这种自底向上的设计思路显著增强了所得结果的可解释性。我们通过新闻媒体、政治倾向、话题、独立实体及人口统计细分等维度的分析实例阐明其运行机制。该框架已在大型法语新闻语料库中实现应用,但其设计可无缝适配其他语言与国家语境。