The advent of pre-trained Language Models (LMs) has markedly advanced natural language processing, but their efficacy in out-of-distribution (OOD) scenarios remains a significant challenge. Computational argumentation (CA), modeling human argumentation processes, is a field notably impacted by these challenges because complex annotation schemes and high annotation costs naturally lead to resources barely covering the multiplicity of available text sources and topics. Due to this data scarcity, generalization to data from uncovered covariant distributions is a common challenge for CA tasks like stance detection or argument classification. This work systematically assesses LMs' capabilities for such OOD scenarios. While previous work targets specific OOD types like topic shifts or OOD uniformly, we address three prevalent OOD scenarios in CA: topic shift, domain shift, and language shift. Our findings challenge the previously asserted general superiority of in-context learning (ICL) for OOD. We find that the efficacy of such learning paradigms varies with the type of OOD. Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts. To sum up, we navigate the heterogeneity of OOD scenarios in CA and empirically underscore the potential of base-sized LMs in overcoming these challenges.
翻译:预训练语言模型(LMs)的出现显著推动了自然语言处理的发展,但其在分布外(OOD)场景中的有效性仍是一项重大挑战。计算论证(CA)作为模拟人类论证过程的研究领域,尤其受到这些挑战的影响,因为复杂的标注方案和高昂的标注成本导致现有资源几乎无法覆盖可利用文本来源和主题的多样性。由于这种数据稀缺性,对于立场检测或论证分类等CA任务而言,泛化到未覆盖的协变量分布数据是一个常见难题。本研究系统评估了LM对此类OOD场景的能力。尽管以往工作聚焦于主题偏移或均匀OOD等特定OOD类型,我们重点处理了CA中三种常见的OOD场景:主题偏移、领域偏移和语言偏移。我们的发现挑战了先前认为上下文学习(ICL)在OOD中具有普遍优越性的观点。研究表明,此类学习范式的有效性因OOD类型而异。具体而言,ICL在领域偏移中表现卓越,而基于提示的微调在主题偏移中更胜一筹。综上,我们梳理了CA中OOD场景的异质性,并实证强调了基础规模LM在克服这些挑战中的潜力。