Stance detection determines whether the author of a piece of text is in favor of, against, or neutral towards a specified target, and can be used to gain valuable insights into social media. The ubiquitous indirect referral of targets makes this task challenging, as it requires computational solutions to model semantic features and infer the corresponding implications from a literal statement. Moreover, the limited amount of available training data leads to subpar performance in out-of-domain and cross-target scenarios, as data-driven approaches are prone to rely on superficial and domain-specific features. In this work, we decompose the stance detection task from a linguistic perspective, and investigate key components and inference paths in this task. The stance triangle is a generic linguistic framework previously proposed to describe the fundamental ways people express their stance. We further expand it by characterizing the relationship between explicit and implicit objects. We then use the framework to extend one single training corpus with additional annotation. Experimental results show that strategically-enriched data can significantly improve the performance on out-of-domain and cross-target evaluation.
翻译:立场检测旨在确定文本作者对特定目标的支持、反对或中立态度,可用于从社交媒体中获取有价值的见解。目标无处不在的间接指代使得这一任务具有挑战性,因为需要计算解决方案从字面陈述中建模语义特征并推断相应含义。此外,有限的训练数据量导致跨领域和跨目标场景下的性能不佳,因为数据驱动方法易依赖表层和领域特定特征。在本研究中,我们从语言学角度解构立场检测任务,探究该任务中的关键组件和推理路径。立场三角形是先前提出的通用语言学框架,用于描述人们表达立场的基本方式。我们通过刻画显式与隐式对象之间的关系对其进一步扩展。随后利用该框架对单一训练语料库进行额外标注扩展。实验结果表明,策略性增强的数据能显著提升跨领域和跨目标评估的性能。