We propose a novel computational approach to automatically analyze the physical process behind printing of early modern letterpress books via clustering the running titles found at the top of their pages. Specifically, we design and compare custom neural and feature-based kernels for computing pairwise visual similarity of a scanned document's running titles and cluster the titles in order to track any deviations from the expected pattern of a book's printing. Unlike body text which must be reset for every page, the running titles are one of the static type elements in a skeleton forme i.e. the frame used to print each side of a sheet of paper, and were often re-used during a book's printing. To evaluate the effectiveness of our approach, we manually annotate the running title clusters on about 1600 pages across 8 early modern books of varying size and formats. Our method can detect potential deviation from the expected patterns of such skeleton formes, which helps bibliographers understand the phenomena associated with a text's transmission, such as censorship. We also validate our results against a manual bibliographic analysis of a counterfeit early edition of Thomas Hobbes' Leviathan (1651).
翻译:我们提出了一种新颖的计算方法,通过聚类书籍页眉标题来自动分析早期现代活字印刷书籍背后的物理印刷过程。具体而言,我们设计并比较了基于自定义神经网络和特征提取的核函数,用于计算扫描文档中页眉标题的成对视觉相似度,并通过聚类标题来追踪书籍印刷过程中可能偏离预期模式的情况。与每页必须重新排版的正文不同,页眉标题是骨架印版(即用于印刷纸张每一面的框架)中的静态活字元素之一,在书籍印刷过程中常被重复使用。为评估本方法的有效性,我们手工标注了8本不同开本和尺寸的早期现代书籍中约1600页的页眉标题聚类。我们的方法能够检测此类骨架印版可能偏离预期模式的情况,这有助于文献学家理解文本传播过程中的相关现象(如审查制度)。我们还通过对手工文献学分析托马斯·霍布斯《利维坦》(1651年)盗版早期版本的验证,进一步确认了本方法的有效性。