This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library BL19 digital collection (more than 35,000 works from 1700-1899), we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. We focus on knowledge transfer from fiction to non-fiction, investigating how narrative understanding and semantic richness in fiction can improve retrieval for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.
翻译:本研究将信息检索与文化分析领域相结合,以支持对历史知识的公平获取。利用大英图书馆BL19数字馆藏(涵盖1700-1899年间超过35,000部作品),我们构建了一个用于研究19世纪小说与非虚构作品中语言、术语及检索变迁的基准。我们的方法融合了专家驱动的查询设计、段落级相关性标注以及大语言模型(LLM)辅助,创建了一个基于人类专业知识且可扩展的评估框架。我们重点关注从小说到非虚构作品的知识迁移,探究小说中的叙事理解与语义丰富性如何提升学术与事实材料的检索效果。这一跨学科框架不仅提高了检索准确性,还促进了数字档案的可解释性、透明性与文化包容性。我们的工作既提供了实用的评估资源,也为开发支持更丰富、更具历史意识的数字档案交互的检索系统提供了方法论范式,最终致力于构建更具解放性的知识基础设施。