Recent advancements in morpheme segmentation primarily emphasize word-level segmentation, often neglecting the contextual relevance within the sentence. In this study, we redefine the morpheme segmentation task as a sequence-to-sequence problem, treating the entire sentence as input rather than isolating individual words. Our findings reveal that the multilingual model consistently exhibits superior performance compared to monolingual counterparts. While our model did not surpass the performance of the current state-of-the-art, it demonstrated comparable efficacy with high-resource languages while revealing limitations in low-resource language scenarios.
翻译:近年来,语素切分的研究进展主要集中于词级切分,往往忽略了句子内部的上下文关联。在本研究中,我们将语素切分任务重新定义为序列到序列问题,将整个句子而非单个孤立词汇作为输入。我们的研究结果表明,多语言模型始终展现出优于单语言模型的性能。尽管我们的模型未能超越当前最优方法的性能,但在高资源语言上展示了相当的效果,同时揭示了其在低资源语言场景中的局限性。