Named Entity Recognition (NER) is a foundational task in Natural Language Processing (NLP) and Information Retrieval (IR), which facilitates semantic search and structured data extraction. We introduce \textbf{AWED-FiNER}, an open-source collection of agentic tool, web application, and 53 state-of-the-art expert models that provide Fine-grained Named Entity Recognition (FgNER) solutions across 36 languages spoken by more than 6.6 billion people. The agentic tool enables routing multilingual text to specialized expert models to fetch FgNER annotations within seconds. The web-based platform provides a ready-to-use FgNER annotation service for non-technical users. Moreover, the collection of language-specific extremely small open-source state-of-the-art expert models facilitates offline deployment in resource-constrained scenarios, including edge devices. AWED-FiNER covers languages spoken by over 6.6 billion people, ranging from global languages like English, Chinese, Spanish, and Hindi, to low-resource languages like Assamese, Santali, and Odia, along with a specific focus on extremely low-resource vulnerable languages such as Bodo, Manipuri, Bishnupriya, and Mizo. The resources can be accessed here: Agentic Tool (https://github.com/PrachuryyaKaushik/AWED-FiNER), Web Application (https://hf.co/spaces/prachuryyaIITG/AWED-FiNER), and 53 Expert Detector Models (https://hf.co/collections/prachuryyaIITG/awed-finer).
翻译:命名实体识别(NER)是自然语言处理(NLP)与信息检索(IR)领域的基础任务,能够促进语义搜索与结构化数据提取。本文介绍**AWED-FiNER**——一个集智能体工具、Web应用及53个先进专家模型于一体的开源资源集合,为覆盖超过66亿使用者的36种语言提供细粒度命名实体识别(FgNER)解决方案。该智能体工具可将多语言文本路由至专用专家模型,在数秒内获取FgNER标注结果;基于Web的平台为非技术用户提供即用型FgNER标注服务;此外,针对各语言特别优化的极小型开源先进专家模型支持在资源受限场景(包括边缘设备)中进行离线部署。AWED-FiNER涵盖的语言使用者超过66亿,既包含英语、汉语、西班牙语、印地语等全球性语言,也涵盖阿萨姆语、桑塔利语、奥里亚语等低资源语言,并特别关注博多语、曼尼普尔语、比什奴普莱利亚语、米佐语等极度低资源的濒危语言。相关资源可通过以下链接获取:智能体工具(https://github.com/PrachuryyaKaushik/AWED-FiNER)、Web应用(https://hf.co/spaces/prachuryyaIITG/AWED-FiNER)及53个专家检测器模型(https://hf.co/collections/prachuryyaIITG/awed-finer)。