Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771 $F_1$-score on the Zero-shot VAST dataset). We release our code and data at https://github.com/hanshanley/tata.
翻译:立场检测对于理解互联网上的不同态度和观点至关重要。然而,由于文本对特定主题的立场通常高度依赖该主题本身,构建能泛化至未见主题的立场检测模型存在困难。本研究提出利用对比学习以及涵盖多种主题的无标注新闻文章数据集,训练用于下游立场检测的主题无关(TAG)与主题感知(TAW)嵌入。通过将这两种嵌入整合至完整的TATA模型中,我们在多个公开立场检测数据集上取得了最优性能(Zero-shot VAST数据集上$F_1$分数为0.771)。相关代码与数据已发布至https://github.com/hanshanley/tata。