We propose a novel computational approach to automatically analyze the physical process behind printing of early modern letterpress books via clustering the running titles found at the top of their pages. Specifically, we design and compare custom neural and feature-based kernels for computing pairwise visual similarity of a scanned document's running titles and cluster the titles in order to track any deviations from the expected pattern of a book's printing. Unlike body text which must be reset for every page, the running titles are one of the static type elements in a skeleton forme i.e. the frame used to print each side of a sheet of paper, and were often re-used during a book's printing. To evaluate the effectiveness of our approach, we manually annotate the running title clusters on about 1600 pages across 8 early modern books of varying size and formats. Our method can detect potential deviation from the expected patterns of such skeleton formes, which helps bibliographers understand the phenomena associated with a text's transmission, such as censorship. We also validate our results against a manual bibliographic analysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).
翻译:我们提出了一种新颖的计算方法,通过聚类书籍页面顶部的栏外标题,自动分析早期现代活版印刷书籍的物理印刷过程。具体而言,我们设计并比较了定制化神经网络与基于特征的核函数,用于计算扫描文档中栏外标题的成对视觉相似性,并对标题进行聚类以追踪书籍印刷过程中与预期模式的偏差。与每页需重排的正文不同,栏外标题是"骨架版式"(即用于纸张两面印刷的框架)中的静态排字元素之一,通常在书籍印刷过程中重复使用。为评估方法的有效性,我们手动标注了8本不同规模和格式的早期现代书籍中约1600页的栏外标题聚类结果。我们的方法可检测与骨架版式预期模式的潜在偏差,有助于文献学家理解文本传播中的相关现象(如审查制度)。同时,我们通过对托马斯·霍布斯《利维坦》(1651年)伪造早期版本的文献学分析结果,验证了本方法的有效性。