Manual data annotation is an important NLP task but one that takes considerable amount of resources and effort. In spite of the costs, labeling and categorizing entities is essential for NLP tasks such as semantic evaluation. Even though annotation can be done by non-experts in most cases, due to the fact that this requires human labor, the process is costly. Another major challenge encountered in data annotation is maintaining the annotation consistency. Annotation efforts are typically carried out by teams of multiple annotators. The annotations need to maintain the consistency in relation to both the domain truth and annotation format while reducing human errors. Annotating a specialized domain that deviates significantly from the general domain, such as fantasy literature, will see a lot of human error and annotator disagreement. So it is vital that proper guidelines and error reduction mechanisms are enforced. One such way to enforce these constraints is using a specialized application. Such an app can ensure that the notations are consistent, and the labels can be pre-defined or restricted reducing the room for errors. In this paper, we present SHADE, an annotation software that can be used to annotate entities in the high fantasy literature domain. Specifically in Dungeons and Dragons lore extracted from the Forgotten Realms Fandom Wiki.
翻译:人工数据标注是自然语言处理中的一项重要任务,但需要耗费大量资源和精力。尽管成本高昂,实体标注与分类对于语义评估等自然语言处理任务至关重要。尽管在多数情况下标注工作可由非专家完成,但由于需要人力投入,该过程成本依然较高。数据标注中面临的另一主要挑战在于保持标注一致性。标注工作通常由多名标注者组成的团队执行。标注结果需在领域真实性与标注格式两方面保持一致性,同时减少人为错误。对于偏离通用领域较远的专业领域(如奇幻文学)进行标注时,会出现大量人为错误与标注者分歧。因此,实施规范的指导原则与误差消减机制至关重要。实现这些约束的有效途径之一是采用专用应用程序。此类应用能够确保标注符号的一致性,并通过预定义或限制标签选项来减少错误空间。本文提出SHADE——一款可用于高奇幻文学领域实体标注的软件,特别针对从《被遗忘的国度》粉丝维基中提取的《龙与地下城》背景知识进行标注。