We address an important gap in detecting political bias in news articles. Previous works that perform document classification can be influenced by the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust and style-agnostic approach to detecting political bias in news articles. We introduce a novel multi-head hierarchical attention model that effectively encodes the structure of long documents through a diverse ensemble of attention heads. While journalism follows a formalized rhetorical structure, the writing style may vary by news outlet. We demonstrate that our method overcomes this domain dependency and outperforms previous approaches for robustness and accuracy. Further analysis and human evaluation demonstrate the ability of our model to capture common discourse structures in journalism. Our code is available at: https://github.com/xfactlab/emnlp2023-Document-Hierarchy
翻译:我们针对新闻文章中政治偏见检测的重要空白进行了探讨。以往的文档分类方法易受各新闻媒体写作风格的影响,导致过拟合及泛化能力受限。本文提出的方法通过结合句子级语义与文档级修辞结构,有效克服了这一局限,从而实现了更鲁棒且风格无关的新闻政治偏见检测。我们引入了一种新颖的多头层级注意力模型,通过多样化的注意力头集成机制,高效编码长文档的结构信息。尽管新闻写作遵循固定的修辞结构,但不同媒体的写作风格可能存在差异。实验证明,我们的方法能克服这种领域依赖性,在鲁棒性和准确性上均优于现有方法。进一步的分析与人工评估显示,该模型能够捕捉新闻中的常见语篇结构。代码已开源于:https://github.com/xfactlab/emnlp2023-Document-Hierarchy