We introduce AWED-FiNER, an open-source ecosystem designed to bridge the gap in Fine-grained Named Entity Recognition (FgNER) for 36 global languages spoken by more than 6.6 billion people. While Large Language Models (LLMs) dominate general Natural Language Processing (NLP) tasks, they often struggle with low-resource languages and fine-grained NLP tasks. AWED-FiNER provides a collection of agentic toolkits, web applications, and several state-of-the-art expert models that provides FgNER solutions across 36 languages. The agentic tools enable to route multilingual text to specialized expert models and fetch FgNER annotations within seconds. The web-based platforms provide ready-to-use FgNER annotation service for non-technical users. Moreover, the collection of language specific extremely small sized open-source state-of-the-art expert models facilitate offline deployment in resource contraint scenerios including edge devices. AWED-FiNER covers languages spoken by over 6.6 billion people, including a specific focus on vulnerable languages such as Bodo, Manipuri, Bishnupriya, and Mizo. The resources can be accessed here: Agentic Tool (https://github.com/PrachuryyaKaushik/AWED-FiNER), Web Application (https://hf.co/spaces/prachuryyaIITG/AWED-FiNER), and 49 Expert Detector Models (https://hf.co/collections/prachuryyaIITG/awed-finer).
翻译:本文介绍AWED-FiNER,一个旨在为超过66亿人口使用的36种全球语言填补细粒度命名实体识别(FgNER)研究空白的开源生态系统。尽管大型语言模型(LLMs)在通用自然语言处理(NLP)任务中占据主导地位,但它们通常在低资源语言和细粒度NLP任务上表现欠佳。AWED-FiNER提供了一套包含智能体工具包、Web应用程序以及多个先进专家模型的集合,为36种语言提供FgNER解决方案。其智能体工具能够将多语言文本路由至专门的专家模型,并在数秒内获取FgNER标注结果。基于Web的平台为非技术用户提供了开箱即用的FgNER标注服务。此外,该系列包含针对特定语言的、体积极小的开源先进专家模型,便于在资源受限场景(包括边缘设备)中进行离线部署。AWED-FiNER覆盖了超过66亿人口使用的语言,并特别关注博多语、曼尼普尔语、比什奴普莱亚语和米佐语等弱势语言。相关资源可通过以下链接访问:智能体工具(https://github.com/PrachuryyaKaushik/AWED-FiNER)、Web应用(https://hf.co/spaces/prachuryyaIITG/AWED-FiNER)以及49个专家检测器模型(https://hf.co/collections/prachuryyaIITG/awed-finer)。