Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771 $F_1$-score on the Zero-shot VAST dataset). We release our code and data at https://github.com/hanshanley/tata.
翻译:立场检测对于理解互联网上不同的态度和信念至关重要。然而,由于文本对特定主题的立场往往高度依赖于该主题,构建能泛化至未见主题的立场检测模型颇具挑战。本文提出利用对比学习以及覆盖多种主题的无标注新闻文章数据集,训练主题无关(TAG)与主题感知(TAW)嵌入,用于下游立场检测任务。通过将这些嵌入结合至完整的TATA模型中,我们在多个公开立场检测数据集上取得了最优性能(在Zero-shot VAST数据集上F1-score达0.771)。相关代码与数据已发布至https://github.com/hanshanley/tata。