Stance detection plays a pivotal role in enabling an extensive range of downstream applications, from discourse parsing to tracing the spread of fake news and the denial of scientific facts. While most stance classification models rely on textual representation of the utterance in question, prior work has demonstrated the importance of the conversational context in stance detection. In this work we introduce TASTE -- a multimodal architecture for stance detection that harmoniously fuses Transformer-based content embedding with unsupervised structural embedding. Through the fine-tuning of a pretrained transformer and the amalgamation with social embedding via a Gated Residual Network (GRN) layer, our model adeptly captures the complex interplay between content and conversational structure in determining stance. TASTE achieves state-of-the-art results on common benchmarks, significantly outperforming an array of strong baselines. Comparative evaluations underscore the benefits of social grounding -- emphasizing the criticality of concurrently harnessing both content and structure for enhanced stance detection.
翻译:立场检测在支持广泛的下游应用中发挥着关键作用,从话语解析到追踪虚假新闻传播及科学事实否认。尽管大多数立场分类模型依赖于相关话语的文本表示,先前研究已证明对话语境在立场检测中的重要性。本研究提出TASTE——一种多模态立场检测架构,它将基于Transformer的内容嵌入与无监督结构嵌入和谐融合。通过微调预训练的Transformer模型,并借助门控残差网络(GRN)层与社交嵌入进行整合,我们的模型能够精准捕捉内容与对话结构在立场判定中复杂的相互作用。TASTE在常见基准测试中取得了最先进的结果,显著优于一系列强基线模型。对比评估结果突显了社交基础的优势——强调同时利用内容与结构对于提升立场检测性能的至关重要性。