Mexico is a country with a large number of indigenous languages, among which the most widely spoken is Nawatl, with more than two million people currently speaking it (mainly in North and Central America). Despite its rich cultural heritage, which dates back to the 15th century, Nawatl is a language with few computer resources. The problem is compounded when it comes to its dialectal varieties, with approximately 30 varieties recognised, not counting the different spellings in the written forms of the language. In this research work, we addressed the problem of classifying Nawatl varieties using Machine Learning and Neural Networks.
翻译:墨西哥是一个拥有大量土著语言的国家,其中使用最广泛的是纳瓦特尔语,目前有超过两百万人使用该语言(主要分布在北美和中美洲)。尽管纳瓦特尔语拥有可追溯至15世纪的丰富文化遗产,但其计算机资源却十分匮乏。这一问题在其方言变体分类中尤为突出:该语言已获认可的方言变体约30种,且书面形式中存在多种拼写体系尚未计入。在本研究工作中,我们运用机器学习与神经网络方法,针对纳瓦特尔语方言变体的分类问题展开了系统性研究。