Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 50 studies from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across fifteen downstream tasks, including morphological processing, part-of-speech tagging, named entity recognition, machine translation, question-answering, speech recognition, and synthesis. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently driven by milestones in resource creation. We identify key challenges rooted in Tigrinya's morphological properties and resource scarcity, and highlight promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves both as a reference for researchers and as a roadmap for advancing Tigrinya NLP. An anthology of surveyed studies and resources is publicly available.
翻译:尽管提格雷语拥有数百万使用者,但在自然语言处理研究领域仍处于严重不足的状态。本文对提格雷语自然语言处理研究进行了全面综述,系统分析了2011年至2025年间的50余项研究。我们按十五个下游任务系统梳理了当前计算资源、模型与应用的发展现状,涵盖形态处理、词性标注、命名实体识别、机器翻译、问答系统、语音识别与合成等领域。分析表明该领域呈现出从基于规则的基础系统向现代神经架构演进的清晰轨迹,其发展始终以资源建设的里程碑事件为驱动。我们揭示了根植于提格雷语形态特性与资源稀缺性的核心挑战,并指出了包括形态感知建模、跨语言迁移以及以社区为中心的资源开发等具有前景的研究方向。本工作既可为研究人员提供参考,也可作为推进提格雷语自然语言处理发展的路线图。所有综述的研究与资源汇编已公开提供。