Humans can effortlessly understand the coordinate structure of sentences such as "Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively". In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of "respectively". We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions.
翻译:人类能轻松理解诸如"尼尔斯·玻尔和库尔特·科本分别出生于哥本哈根和西雅图"这类句子的并列结构。本研究聚焦自然语言推理(NLI)任务,从句法语义与常识世界知识两个维度,考察语言模型(LMs)如何推理"分别"解读(Gawron and Kehler, 2004)。我们构建了受控合成数据集WikiResNLI与自然语料数据集NatResNLI,涵盖"分别"的多种显式与隐式表达形式。研究显示,微调后的NLI模型在缺乏显式监督时难以理解此类解读。虽在存在显式线索时少样本学习较易实现,但当需通过隐式方式激发解读时,模型需更长时间训练,并依赖常识推理。此外,细粒度分析表明模型无法跨不同句法结构进行泛化。结论指出,在泛化至语言构式的长尾分布方面,语言模型仍落后于人类表现。