Compiling large datasets from published resources, such as archaeological find catalogues presents fundamental challenges: identifying relevant content and manually recording it is a time-consuming, repetitive and error-prone task. For the data to be useful, it must be of comparable quality and adhere to the same recording standards, which is hardly ever the case in archaeology. Here, we present a new data collection method exploiting recent advances in Artificial Intelligence. Our software uses an object detection neural network combined with further classification networks to speed up, automate, and standardise data collection from legacy resources, such as archaeological drawings and photographs in large unsorted PDF files. The AI-assisted workflow detects common objects found in archaeological catalogues, such as graves, skeletons, ceramics, ornaments, stone tools and maps, and spatially relates and analyses these objects on the page to extract real-life attributes, such as the size and orientation of a grave based on the north arrow and the scale. A graphical interface allows for and assists with manual validation. We demonstrate the benefits of this approach by collecting a range of shapes and numerical attributes from richly-illustrated archaeological catalogues, and benchmark it in a real-world experiment with ten users. Moreover, we record geometric whole-outlines through contour detection, an alternative to landmark-based geometric morphometrics not achievable by hand.
翻译:从已出版资源(如考古发现目录)中整理大规模数据集面临根本性挑战:识别相关内容并手动记录是一项耗时、重复且易出错的任务。要使数据具有实用性,必须保持可比质量并遵循统一的记录标准,这在考古学中几乎难以实现。本文提出一种利用人工智能最新进展的新型数据采集方法。我们的软件采用目标检测神经网络与多级分类网络相结合的方式,加速、自动化并标准化来自遗产资源(如大型未分类PDF文件中的考古图纸与照片)的数据采集。该AI辅助工作流可检测考古目录中的常见对象(如墓葬、骨骼、陶器、饰品、石器和地图),并基于页面空间关系分析这些对象,从而提取其实物属性(例如根据指北针和比例尺确定墓葬尺寸与朝向)。图形化界面支持并辅助人工验证。我们通过从图文详实的考古目录中采集多种形状与数值属性来验证该方法优势,并在包含十名用户的真实实验中设立基准测试。此外,我们通过轮廓检测记录几何整体轮廓——这是一种无法手工实现的全几何形态测量替代方案,突破了传统基于地标点的几何形态测量方法的局限。