The context of this paper is the creation of large uniform archaeological datasets from heterogeneous published resources, such as find catalogues - with the help of AI and Big Data. The paper is concerned with the challenge of consistent assemblages of archaeological data. We cannot simply combine existing records, as they differ in terms of quality and recording standards. Thus, records have to be recreated from published archaeological illustrations. This is only a viable path with the help of automation. The contribution of this paper is a new workflow for collecting data from archaeological find catalogues available as legacy resources, such as archaeological drawings and photographs in large unsorted PDF files; the workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data. We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow, thereby speeding up, automating, and standardising data collection. Objects commonly found in archaeological catalogues - such as graves, skeletons, ceramics, ornaments, stone tools and maps - are detected. Those objects are spatially related and analysed to extract real-life attributes, such as the size and orientation of graves based on the north arrow and the scale. We also automate recording of geometric whole-outlines through contour detection, as an alternative to landmark-based geometric morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted. We use third millennium BC Europe (encompassing cultures such as 'Corded Ware' and 'Bell Beaker', and their burial practices) as a 'testing ground' and for evaluation purposes; this includes a user study for the workflow and the AutArch software.
翻译:本文的研究背景是基于人工智能和大数据技术,从异质性出版资源(如考古发现目录)中构建大规模标准化考古数据集。研究聚焦于考古数据一致整合的挑战:由于现有记录在质量和记录标准上存在差异,我们无法直接合并这些记录,因此必须基于已出版的考古插图重新创建数据。这一路径唯有借助自动化技术才具可行性。本文的贡献在于提出了一种新工作流程,用于从遗留资源(例如大型未分类PDF文件中的考古图版和照片)中收集考古目录数据;该工作流程依托定制软件AutArch,支持图像处理、目标检测以及交互式验证与调整自动提取数据的功能。我们将基于神经网络的深度学习技术(用于目标检测与分类)集成到工作流程中,从而加速、自动化并标准化数据收集流程。系统可检测考古目录中常见的对象——如墓葬、骨骼、陶器、饰品、石器和地图——并通过空间关联分析提取现实属性信息,例如基于指北针和比例尺计算墓葬的尺寸与朝向。此外,我们通过轮廓检测实现几何完整轮廓的自动化记录,作为基于地标点的几何形态测量法的替代方案。检测到的对象、轮廓及其他自动提取的数据均可进行人工验证与调整。本文以公元前三千年欧洲(涵盖“绳纹器文化”“钟形杯文化”等文化及其葬俗实践)作为测试与评估领域,并对工作流程及AutArch软件开展了用户研究。