Semi-Supervised Image-Based Narrative Extraction: A Case Study with Historical Photographic Records

from arxiv, This paper has been accepted for oral presentation in the findings track of the 47th European Conference on Information Retrieval (ECIR 2025). Source code and experiments are available at https://github.com/faustogerman/ROGER-Concept-Narratives

This paper presents a semi-supervised approach to extracting narratives from historical photographic records using an adaptation of the narrative maps algorithm. We extend the original unsupervised text-based method to work with image data, leveraging deep learning techniques for visual feature extraction and similarity computation. Our method is applied to the ROGER dataset, a collection of photographs from the 1928 Sacambaya Expedition in Bolivia captured by Robert Gerstmann. We compare our algorithmically extracted visual narratives with expert-curated timelines of varying lengths (5 to 30 images) to evaluate the effectiveness of our approach. In particular, we use the Dynamic Time Warping (DTW) algorithm to match the extracted narratives with the expert-curated baseline. In addition, we asked an expert on the topic to qualitatively evaluate a representative example of the resulting narratives. Our findings show that the narrative maps approach generally outperforms random sampling for longer timelines (10+ images, p < 0.05), with expert evaluation confirming the historical accuracy and coherence of the extracted narratives. This research contributes to the field of computational analysis of visual cultural heritage, offering new tools for historians, archivists, and digital humanities scholars to explore and understand large-scale image collections. The method's ability to generate meaningful narratives from visual data opens up new possibilities for the study and interpretation of historical events through photographic evidence.

翻译：本文提出一种半监督方法，通过改进叙事地图算法从历史摄影记录中提取叙事线索。我们将原始基于文本的无监督方法扩展至图像数据处理，利用深度学习技术进行视觉特征提取与相似度计算。本方法应用于ROGER数据集——该数据集收录了罗伯特·格斯特曼在1928年玻利维亚萨坎巴亚考察期间拍摄的摄影集。我们将算法提取的视觉叙事与专家构建的不同长度（5至30张图像）时间线进行对比，以评估方法的有效性。具体而言，我们采用动态时间规整（DTW）算法将提取的叙事与专家构建的基准线进行匹配。此外，我们邀请领域专家对生成叙事的代表性案例进行定性评估。研究结果表明：对于较长的时间线（10张以上图像，p < 0.05），叙事地图方法普遍优于随机采样，专家评估也证实了提取叙事的历史准确性与连贯性。本研究为视觉文化遗产的计算分析领域作出贡献，为历史学家、档案管理员和数字人文研究者探索大规模图像集提供了新工具。该方法从视觉数据生成有意义叙事的能力，为通过摄影证据研究和阐释历史事件开辟了新的可能性。