The substance of this paper is the description of the use of Retrieval-Augmented Generation (RAG) for specific digital collections of cultural assets. The collections are provided by institutions operating in the cultural sector. The topical areas are the humanities and social sciences. More concretely, most of the work presented here was enabled by a European-funded research project MuseIT which is clearly situated in the realm of fostering new technologies for Cultural Heritage. We adhere to this interaction by presenting a sequence of our experimentations. This sequence is narrated as a specific journey of engineering all executed around a specific data-sharing and archiving platform Dataverse. Implementing a local chatbot for collections - a method also known as RAG in Information Retrieval - is the current culmination of this journey. The engineering journey we describe in the core of the paper starts from "archives for everyone" and ends with "local chatbots for specific collections".
翻译:本文核心内容为描述如何利用检索增强生成(RAG)技术处理特定数字文化资产馆藏,这些馆藏由文化领域机构提供,主题涉及人文与社会科学。更具体而言,本文所述工作主要依托欧洲资助的MuseIT研究项目展开,该项目明确聚焦于文化遗产领域的新技术培育。我们通过呈现一系列实验过程来体现这一交互关系,该过程被叙述为围绕特定数据共享与存储平台Dataverse展开的具体工程实践。为馆藏构建本地化聊天机器人——这一在信息检索领域被称为RAG的方法——正是该工程实践当前阶段的最终成果。本文核心阐述的工程实践始于"面向全民的档案",终于"面向特定馆藏的本地化聊天机器人"。