In this paper, we present the first automatic lexical simplification system for the Turkish language. Recent text simplification efforts rely on manually crafted simplified corpora and comprehensive NLP tools that can analyse the target text both in word and sentence levels. Turkish is a morphologically rich agglutinative language that requires unique considerations such as the proper handling of inflectional cases. Being a low-resource language in terms of available resources and industrial-strength tools, it makes the text simplification task harder to approach. We present a new text simplification pipeline based on pretrained representation model BERT together with morphological features to generate grammatically correct and semantically appropriate word-level simplifications.
翻译:本文提出了首个面向土耳其语的自动词汇简化系统。当前的文本简化研究依赖于人工构建的简化语料库以及能够从词级和句级两个层面分析目标文本的综合自然语言处理工具。土耳其语是一种形态丰富的黏着语,需要特殊处理格变化等屈折现象。由于其在可用资源和工业级工具方面属于低资源语言,这使得文本简化任务更具挑战性。我们提出了一种基于预训练表示模型BERT与形态特征的新文本简化流水线,以生成语法正确且语义恰当的词汇级简化结果。