Named Entity Recognition (NER) is a sequence classification Natural Language Processing task where entities are identified in the text and classified into predefined categories. It acts as a foundation for most information extraction systems. Dungeons and Dragons (D&D) is an open-ended tabletop fantasy game with its own diverse lore. DnD entities are domain-specific and are thus unrecognizable by even the state-of-the-art off-the-shelf NER systems as the NER systems are trained on general data for pre-defined categories such as: person (PERS), location (LOC), organization (ORG), and miscellaneous (MISC). For meaningful extraction of information from fantasy text, the entities need to be classified into domain-specific entity categories as well as the models be fine-tuned on a domain-relevant corpus. This work uses available lore of monsters in the D&D domain to fine-tune Trankit, which is a prolific NER framework that uses a pre-trained model for NER. Upon this training, the system acquires the ability to extract monster names from relevant domain documents under a novel NER tag. This work compares the accuracy of the monster name identification against; the zero-shot Trankit model and two FLAIR models. The fine-tuned Trankit model achieves an 87.86% F1 score surpassing all the other considered models.
翻译:命名实体识别(NER)是一项序列分类自然语言处理任务,旨在从文本中识别实体并将其归类至预定义类别,是多数信息抽取系统的基础。《龙与地下城》(Dungeons and Dragons, D&D)是一款开放式桌面幻想游戏,拥有独特的多样化世界观体系。由于其实体具有领域特异性,即便最先进的通用NER系统(如针对人物(PERS)、地点(LOC)、组织(ORG)及杂项(MISC)等预定义类别训练的模型)也无法识别。为实现对幻想文本的有效信息抽取,需将实体归类至领域特定类别,并对模型在领域相关语料库上进行微调。本研究利用D&D领域可用的怪物世界观体系数据,对Trankit(一种基于预训练模型的高性能NER框架)进行微调。通过训练,系统能够以新型NER标签从相关领域文档中提取怪物名称。本研究将微调后模型识别的怪物名称准确性与零样本Trankit模型及两个FLAIR模型进行对比。结果显示,微调后的Trankit模型F1分数达到87.86%,超越所有对照模型。