AutArch: An AI-assisted workflow for object detection and automated recording in archaeological catalogues

The context of this paper is the creation of large uniform archaeological datasets from heterogeneous published resources, such as find catalogues - with the help of AI and Big Data. The paper is concerned with the challenge of consistent assemblages of archaeological data. We cannot simply combine existing records, as they differ in terms of quality and recording standards. Thus, records have to be recreated from published archaeological illustrations. This is only a viable path with the help of automation. The contribution of this paper is a new workflow for collecting data from archaeological find catalogues available as legacy resources, such as archaeological drawings and photographs in large unsorted PDF files; the workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data. We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow, thereby speeding up, automating, and standardising data collection. Objects commonly found in archaeological catalogues - such as graves, skeletons, ceramics, ornaments, stone tools and maps - are detected. Those objects are spatially related and analysed to extract real-life attributes, such as the size and orientation of graves based on the north arrow and the scale. We also automate recording of geometric whole-outlines through contour detection, as an alternative to landmark-based geometric morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted. We use third millennium BC Europe (encompassing cultures such as 'Corded Ware' and 'Bell Beaker', and their burial practices) as a 'testing ground' and for evaluation purposes; this includes a user study for the workflow and the AutArch software.

翻译：本文的研究背景是基于人工智能和大数据技术，从异质性出版资源（如考古发现目录）中构建大规模标准化考古数据集。研究聚焦于考古数据一致整合的挑战：由于现有记录在质量和记录标准上存在差异，我们无法直接合并这些记录，因此必须基于已出版的考古插图重新创建数据。这一路径唯有借助自动化技术才具可行性。本文的贡献在于提出了一种新工作流程，用于从遗留资源（例如大型未分类PDF文件中的考古图版和照片）中收集考古目录数据；该工作流程依托定制软件AutArch，支持图像处理、目标检测以及交互式验证与调整自动提取数据的功能。我们将基于神经网络的深度学习技术（用于目标检测与分类）集成到工作流程中，从而加速、自动化并标准化数据收集流程。系统可检测考古目录中常见的对象——如墓葬、骨骼、陶器、饰品、石器和地图——并通过空间关联分析提取现实属性信息，例如基于指北针和比例尺计算墓葬的尺寸与朝向。此外，我们通过轮廓检测实现几何完整轮廓的自动化记录，作为基于地标点的几何形态测量法的替代方案。检测到的对象、轮廓及其他自动提取的数据均可进行人工验证与调整。本文以公元前三千年欧洲（涵盖“绳纹器文化”“钟形杯文化”等文化及其葬俗实践）作为测试与评估领域，并对工作流程及AutArch软件开展了用户研究。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日