The advent of pre-trained Language Models (LMs) has markedly advanced natural language processing, but their efficacy in out-of-distribution (OOD) scenarios remains a significant challenge. Computational argumentation (CA), modeling human argumentation processes, is a field notably impacted by these challenges because complex annotation schemes and high annotation costs naturally lead to resources barely covering the multiplicity of available text sources and topics. Due to this data scarcity, generalization to data from uncovered covariant distributions is a common challenge for CA tasks like stance detection or argument classification. This work systematically assesses LMs' capabilities for such OOD scenarios. While previous work targets specific OOD types like topic shifts or OOD uniformly, we address three prevalent OOD scenarios in CA: topic shift, domain shift, and language shift. Our findings challenge the previously asserted general superiority of in-context learning (ICL) for OOD. We find that the efficacy of such learning paradigms varies with the type of OOD. Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts. To sum up, we navigate the heterogeneity of OOD scenarios in CA and empirically underscore the potential of base-sized LMs in overcoming these challenges.
翻译:预训练语言模型的出现显著推动了自然语言处理的发展,但其在分布外场景下的有效性仍面临重大挑战。计算论证作为对人类论证过程进行建模的领域,尤其受到这些挑战的影响,因为复杂的标注方案和高昂的标注成本自然导致资源难以覆盖可用文本来源和主题的多样性。由于这种数据稀缺性,向未覆盖的协变量分布数据进行泛化成为立场检测或论证分类等计算论证任务的常见难题。本研究系统评估了语言模型在此类分布外场景下的能力。先前研究通常针对特定类型的分布外场景(如主题偏移)或将其统一处理,而本文则聚焦计算论证中三种典型的分布外场景:主题偏移、领域偏移和语言偏移。我们的发现对先前主张的上下文学习在分布外场景中普遍优越性的观点提出了挑战。研究表明,此类学习范式的有效性随分布外类型的变化而不同。具体而言,上下文学习在领域偏移中表现优异,而基于提示的微调在主题偏移中更具优势。总之,本研究梳理了计算论证中分布外场景的异质性,并通过实证研究强调了基础规模语言模型在应对这些挑战方面的潜力。