This paper introduces FRAME (Fine-grained Recognition of Art-historical Metadata and Entities), a manually annotated dataset of art-historical image descriptions for Named Entity Recognition (NER) and Relation Extraction (RE). Descriptions were collected from museum catalogs, auction listings, open-access platforms, and scholarly databases, then filtered to ensure that each text focuses on a single artwork and contains explicit statements about its material, composition, or iconography. FRAME provides stand-off annotations in three layers: a metadata layer for object-level properties, a content layer for depicted subjects and motifs, and a co-reference layer linking repeated mentions. Across layers, entity spans are labeled with 37 types and connected by typed RE links between mentions. Entity types are aligned with Wikidata to support Named Entity Linking (NEL) and downstream knowledge-graph construction. The dataset is released as UIMA XMI Common Analysis Structure (CAS) files with accompanying images and bibliographic metadata, and can be used to benchmark and fine-tune NER and RE systems, including zero- and few-shot setups with Large Language Models (LLMs).
翻译:本文介绍了FRAME(艺术史元数据与实体的细粒度识别),这是一个用于命名实体识别(NER)与关系抽取(RE)的人工标注艺术史图像描述数据集。描述文本收集自博物馆目录、拍卖清单、开放获取平台及学术数据库,并经过筛选以确保每段文本聚焦于单件艺术品,且包含对其材料、构图或图像志的明确陈述。FRAME提供了三层独立标注:用于对象级属性的元数据层、用于描绘主题与母题的内容层,以及连接重复指称的共指层。所有层中的实体跨度均标注有37种类型,并通过类型化的RE链接在指称间建立关联。实体类型与Wikidata对齐,以支持命名实体链接(NEL)及下游知识图谱构建。该数据集以UIMA XMI通用分析结构(CAS)文件形式发布,附带相关图像及书目元数据,可用于基准测试及微调NER与RE系统,包括与大语言模型(LLMs)结合的零样本与少样本设置。