Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.
翻译:主题标引对于文献发现至关重要,但难以在跨语言和大规模场景下持续实施。我们发布了一个包含集成规范文档(GND)标注的大型双语(英语/德语)编目记录语料库,以及一个机器可操作的GND分类体系。该资源支持基于本体的多标签分类、文本到规范术语的映射,并可通过可复现的规范基础评估实现智能辅助编目。我们提供了三个系统的简要统计特征与定性误差分析。我们呼吁学界不仅评估准确性,更要关注实用性与透明度,以构建能够增强编目员工作的规范锚定式人工智能协同系统。