A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

This paper introduces FRAME (Fine-grained Recognition of Art-historical Metadata and Entities), a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE). Descriptions were collected from museum catalogs, auction listings, open-access platforms, and scholarly databases, then filtered to ensure that each text focuses on a single artwork and contains explicit statements about its material, composition, or iconography. FRAME provides stand-off annotations in three layers: a metadata layer for object-level properties, a content layer for depicted subjects and motifs, and a co-reference layer linking repeated mentions. Across layers, entity spans are labeled with 37 types and connected by typed RE links between mentions. Entity types are aligned with Wikidata to support Named Entity Linking (NEL) and downstream knowledge-graph construction. The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and bibliographic metadata, and can be used to benchmark and fine-tune NER and RE systems, including zero- and few-shot setups with Large Language Models (LLMs).

翻译：本文介绍了FRAME（艺术史元数据与实体的细粒度识别），这是一个用于命名实体识别（NER）与关系抽取（RE）的人工标注艺术史图像描述数据集。描述文本收集自博物馆目录、拍卖清单、开放获取平台及学术数据库，并经过筛选以确保每段文本聚焦于单件艺术品，且包含对其材料、构图或图像志的明确陈述。FRAME提供了三层独立标注：用于对象级属性的元数据层、用于描绘主题与母题的内容层，以及连接重复指称的共指层。所有层中的实体跨度均标注有37种类型，并通过类型化的RE链接在指称间建立关联。实体类型与Wikidata对齐，以支持命名实体链接（NEL）及下游知识图谱构建。该数据集以UIMA XMI通用分析结构（CAS）文件形式发布，附带相关图像及书目元数据，可用于基准测试及微调NER与RE系统，包括与大语言模型（LLMs）结合的零样本与少样本设置。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

「基于深度学习的实体关系联合抽取」研究综述

专知会员服务

43+阅读 · 2023年7月5日

「中文电子病历命名实体识别」的研究与进展

专知会员服务

31+阅读 · 2022年11月5日

中文领域命名实体识别综述

专知会员服务

72+阅读 · 2021年8月20日

面向知识图谱的信息抽取

专知会员服务

204+阅读 · 2020年10月14日