The context of this paper is the creation of large uniform archaeological datasets from heterogeneous published resources, such as find catalogues - with the help of AI and Big Data. The paper is concerned with the challenge of consistent assemblages of archaeological data. We cannot simply combine existing records, as they differ in terms of quality and recording standards. Thus, records have to be recreated from published archaeological illustrations. This is only a viable path with the help of automation. The contribution of this paper is a new workflow for collecting data from archaeological find catalogues available as legacy resources, such as archaeological drawings and photographs in large unsorted PDF files; the workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data. We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow, thereby speeding up, automating, and standardising data collection. Objects commonly found in archaeological catalogues - such as graves, skeletons, ceramics, ornaments, stone tools and maps - are detected. Those objects are spatially related and analysed to extract real-life attributes, such as the size and orientation of graves based on the north arrow and the scale. We also automate recording of geometric whole-outlines through contour detection, as an alternative to landmark-based geometric morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted. We use third millennium BC Europe (encompassing cultures such as 'Corded Ware' and 'Bell Beaker', and their burial practices) as a 'testing ground' and for evaluation purposes; this includes a user study for the workflow and the AutArch software.
翻译:本文的研究背景是借助人工智能与大数据技术,从已出版的异质性资源(如出土文物目录)中创建大规模标准化考古数据集。本文聚焦于考古数据一致性整合的挑战。由于现有记录在质量与记录标准方面存在差异,我们无法直接合并这些记录。因此,必须根据已出版的考古插图重新创建记录。唯有通过自动化手段,这一路径才具备可行性。本文的核心贡献是提出了一种从遗留资源(如大型未分类PDF文件中的考古线绘图与照片)中采集考古出土文物目录数据的新型工作流;该工作流依托定制软件(AutArch)实现图像处理、物体检测,并提供交互式手段以验证与调整自动提取的数据。我们将基于神经网络的人工智能(AI)物体检测与分类技术整合到工作流中,从而加速数据采集过程,实现其自动化与标准化。工作流能够检测考古目录中常见的物体——如墓葬、骨骼、陶器、装饰品、石器与地图。通过空间关联与分析这些物体,可提取现实属性(例如基于指北针与比例尺的墓葬尺寸与朝向)。我们还通过轮廓检测实现了几何整体轮廓的自动化记录,作为基于地标点几何形态测量的替代方案。检测到的物体、轮廓及其他自动提取数据均可进行人工验证与调整。我们以公元前第三千纪的欧洲(涵盖“绳纹陶文化”与“钟杯文化”等考古学文化及其葬俗)作为“试验场”进行评估研究,其中包括针对该工作流及AutArch软件的用户调研。