This paper evaluates the performance of transformer-based language models on split-ergative case alignment in Georgian, a particularly rare system for assigning grammatical cases to mark argument roles. We focus on subject and object marking determined through various permutations of nominative, ergative, and dative noun forms. A treebank-based approach for the generation of minimal pairs using the Grew query language is implemented. We create a dataset of 370 syntactic tests made up of seven tasks containing 50-70 samples each, where three noun forms are tested in any given sample. Five encoder- and two decoder-only models are evaluated with word- and/or sentence-level accuracy metrics. Regardless of the specific syntactic makeup, models performed worst in assigning the ergative case correctly and strongest in assigning the nominative case correctly. Performance correlated with the overall frequency distribution of the three forms (NOM > DAT > ERG). Though data scarcity is a known issue for low-resource languages, we show that the highly specific role of the ergative along with a lack of available training data likely contributes to poor performance on this case. The dataset is made publicly available and the methodology provides an interesting avenue for future syntactic evaluations of languages where benchmarks are limited.
翻译:本文评估了基于Transformer的语言模型在格鲁吉亚语分裂作格性格配列上的表现,这是一种特别罕见的、通过语法格来标记论元角色的系统。我们重点关注通过主格、作格和与格名词形式的各种排列组合所确定的主语和宾语标记。研究采用了一种基于树库的方法,使用Grew查询语言生成最小对立对。我们创建了一个包含370个句法测试的数据集,由七个任务组成,每个任务包含50-70个样本,每个样本测试三种名词形式。我们评估了五个仅编码器模型和两个仅解码器模型,并使用了词级和/或句子级准确率指标。无论具体的句法构成如何,模型在正确分配作格方面表现最差,在正确分配主格方面表现最强。模型表现与三种形式的总体频率分布相关(主格 > 与格 > 作格)。尽管数据稀缺是低资源语言的已知问题,但我们表明,作格的高度特异性角色以及可用训练数据的缺乏,可能是导致模型在此格上表现不佳的原因。该数据集已公开提供,该方法为未来在基准测试有限的语言上进行句法评估提供了一个有趣的途径。