Non-M\=aori-speaking New Zealanders (NMS)are able to segment M\=aori words in a highlysimilar way to fluent speakers (Panther et al.,2024). This ability is assumed to derive through the identification and extraction of statistically recurrent forms. We examine this assumption by asking how NMS segmentations compare to those produced by Morfessor, an unsupervised machine learning model that operates based on statistical recurrence, across words formed by a variety of morphological processes. Both NMS and Morfessor succeed in segmenting words formed by concatenative processes (compounding and affixation without allomorphy), but NMS also succeed for words that invoke templates (reduplication and allomorphy) and other cues to morphological structure, implying that their learning process is sensitive to more than just statistical recurrence.
翻译:非毛利语使用者(NMS)能以与流利使用者高度相似的方式分割毛利语词汇(Panther等人,2024)。这种能力被认为源于对统计重现形式的识别与提取。本研究通过对比NMS与基于统计重现的无监督机器学习模型Morfessor在各类形态过程形成词汇上的分割结果,检验这一假设。NMS与Morfessor均能成功分割通过连接性过程(复合、无词形变异的词缀化)形成的词汇,但NMS还能分割涉及模板(重叠、词形变异)及其他形态结构提示的词汇,表明其学习过程对统计重现之外的因素同样敏感。