字符感知Transformer学习不规则形态模式但无法像人类一样泛化 (Character-aware Transformers Learn an Irregular Morphological Pattern Yet None Generalize Like Humans)

Whether neural networks can serve as cognitive models of morphological learning remains an open question. Recent work has shown that encoder-decoder models can acquire irregular patterns, but evidence that they generalize these patterns like humans is mixed. We investigate this using the Spanish \emph{L-shaped morphome}, where only the first-person singular indicative (e.g., \textit{pongo} `I put') shares its stem with all subjunctive forms (e.g., \textit{ponga, pongas}) despite lacking apparent phonological, semantic, or syntactic motivation. We compare five encoder-decoder transformers varying along two dimensions: sequential vs. position-invariant positional encoding, and atomic vs. decomposed tag representations. Positional encoding proves decisive: position-invariant models recover the correct L-shaped paradigm clustering even when L-shaped verbs are scarce in training, whereas sequential positional encoding models only partially capture the pattern. Yet none of the models productively generalize this pattern to novel forms. Position-invariant models generalize the L-shaped stem across subjunctive cells but fail to extend it to the first-person singular indicative, producing a mood-based generalization rather than the L-shaped morphomic pattern. Humans do the opposite, generalizing preferentially to the first-person singular indicative over subjunctive forms. None of the models reproduce the human pattern, highlighting the gap between statistical pattern reproduction and morphological abstraction.

翻译：神经网络能否作为形态学习的认知模型仍是一个开放问题。近期研究表明编码器-解码器模型能够习得不规则模式，但其是否像人类一样泛化这些模式的证据尚不明确。本研究通过西班牙语中的"L形语素"现象探讨该问题：尽管缺乏明显的音系、语义或句法动因，第一人称单数直陈式（如pongo"我放置"）的词干却与所有虚拟式形式（如ponga、pongas）保持一致。我们比较了五种编码器-解码器Transformer模型，这些模型在两方面存在差异：序列式与位置不变式位置编码，以及原子式与分解式标签表征。实验表明位置编码具有决定性作用：当训练数据中L形动词稀缺时，位置不变模型仍能恢复正确的L形范式聚类，而序列式位置编码模型仅能部分捕获该模式。然而所有模型均未能将此模式有效泛化至新形式：位置不变模型能在虚拟式范畴内泛化L形词干，却无法将其扩展至第一人称单数直陈式，形成基于语气的泛化而非真正的L形语素模式。人类则呈现相反趋势——优先向第一人称单数直陈式而非虚拟式形式泛化。所有模型均未复现人类泛化模式，这凸显了统计模式复现与形态抽象能力之间的本质差异。