Large language models (LLMs) have shown remarkable performance across various sentence-based linguistic phenomena, yet their ability to capture cross-sentence paradigmatic patterns, such as verb alternations, remains underexplored. In this work, we present curated paradigm-based datasets for four languages, designed to probe systematic cross-sentence knowledge of verb alternations (change-of-state and object-drop constructions in English, German and Italian, and Hebrew binyanim). The datasets comprise thousands of the Blackbird Language Matrices (BLMs) problems. The BLM task -- an RPM/ARC-like task devised specifically for language -- is a controlled linguistic puzzle where models must select the sentence that completes a pattern according to syntactic and semantic rules. We introduce three types of templates varying in complexity and apply linguistically-informed data augmentation strategies across synthetic and natural data. We provide simple baseline performance results across English, Italian, German, and Hebrew, that demonstrate the diagnostic usefulness of the datasets.
翻译:大语言模型(LLMs)在各类基于句子的语言现象中展现出卓越性能,但其捕捉跨句子范式模式(如动词交替)的能力仍有待深入探索。本研究针对四种语言构建了基于范式的精选数据集,旨在系统探究动词交替的跨句子知识(涵盖英语、德语和意大利语的状态变化与宾语省略结构,以及希伯来语的动词派生形态)。数据集包含数千个黑鸟语言矩阵(BLMs)问题。BLM任务——一种专为语言设计的类RPM/ARC任务——是一种受控的语言谜题,要求模型必须根据句法和语义规则选择符合模式的句子。我们引入了三种复杂度不同的模板类型,并在合成数据与自然数据上应用了基于语言学理论的数据增强策略。通过提供英语、意大利语、德语和希伯来语的简单基线性能结果,验证了这些数据集的诊断价值。