We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language. It aims to enhance the support for the low-resource Indian language Marathi in the field of NLP. It is an easy-to-use, extensible, and modular toolkit for Marathi text analysis built on state-of-the-art MahaBERT-based transformer models. Our work holds significant importance as other existing Indic NLP libraries provide basic Marathi processing support and rely on older models with restricted performance. Our toolkit stands out by offering a comprehensive array of NLP tasks, encompassing both fundamental preprocessing tasks and advanced NLP tasks like sentiment analysis, NER, hate speech detection, and sentence completion. This paper focuses on an overview of the mahaNLP framework, its features, and its usage. This work is a part of the L3Cube MahaNLP initiative, more information about it can be found at https://github.com/l3cube-pune/MarathiNLP .
翻译:我们提出了mahaNLP,一个专门为马拉地语构建的开源自然语言处理(NLP)库。它旨在增强低资源印度语言马拉地语在NLP领域的支持。这是一个基于先进MahaBERT的Transformer模型构建的、易于使用、可扩展且模块化的马拉地语文本分析工具包。我们的工作具有重要意义,因为其他现有的印度语言NLP库仅提供基本的马拉地语处理支持,且依赖于性能受限的旧模型。我们的工具包通过提供全面的NLP任务集脱颖而出,涵盖了从基础预处理任务到高级NLP任务(如情感分析、命名实体识别(NER)、仇恨言论检测和句子补全)的广泛内容。本文重点介绍了mahaNLP框架的概述、其特性及使用方法。本工作属于L3Cube MahaNLP计划的一部分,更多信息请访问https://github.com/l3cube-pune/MarathiNLP。