In this paper, we present the first application of Native Language Identification (NLI) for the Turkish language. NLI involves predicting the writer's first language by analysing their writing in different languages. While most NLI research has focused on English, our study extends its scope to Turkish. We used the recently constructed Turkish Learner Corpus and employed a combination of three syntactic features (CFG production rules, part-of-speech n-grams, and function words) with L2 texts to demonstrate their effectiveness in this task.
翻译:本文首次将母语识别(NLI)技术应用于土耳其语。母语识别通过分析作者使用不同语言撰写的文本,预测其母语类型。尽管现有NLI研究多聚焦于英语,本研究将其拓展至土耳其语领域。我们利用最新构建的土耳其语学习者语料库,结合三种句法特征(上下文无关文法产生式规则、词性n-gram与功能词)对第二语言文本进行分析,验证了该方法在该任务中的有效性。