Motivated by the challenges of the Digital Ancient Near Eastern Studies (DANES) community, we develop digital tools for processing cuneiform script being a 3D script imprinted into clay tablets used for more than three millennia and at least eight major languages. It consists of thousands of characters that have changed over time and space. Photographs are the most common representations usable for machine learning, while ink drawings are prone to interpretation. Best suited 3D datasets that are becoming available. We created and used the HeiCuBeDa and MaiCuBeDa datasets, which consist of around 500 annotated tablets. For our novel OCR-like approach to mixed image data, we provide an additional mapping tool for transferring annotations between 3D renderings and photographs. Our sign localization uses a RepPoints detector to predict the locations of characters as bounding boxes. We use image data from GigaMesh's MSII (curvature, see https://gigamesh.eu) based rendering, Phong-shaded 3D models, and photographs as well as illumination augmentation. The results show that using rendered 3D images for sign detection performs better than other work on photographs. In addition, our approach gives reasonably good results for photographs only, while it is best used for mixed datasets. More importantly, the Phong renderings, and especially the MSII renderings, improve the results on photographs, which is the largest dataset on a global scale.
翻译:受数字古代近东研究(DANES)领域挑战的驱动,我们开发了用于处理楔形文字的数字化工具。楔形文字是一种印刻在泥板上的三维文字系统,其使用历史跨越三千余年,至少涵盖八种主要语言。该文字体系由数千个随时间与地域变迁而演化的字符构成。照片是机器学习中最常用的表征形式,而墨线图则易受人为解读影响。当前更适用的三维数据集正逐步可用。我们创建并使用了包含约500块带注释泥板的HeiCuBeDa与MaiCuBeDa数据集。针对混合图像数据的新型类OCR方法,我们额外开发了映射工具,用于在三维渲染与照片之间迁移注释。符号定位模块采用RepPoints检测器预测字符位置边界框。我们使用了基于GigaMesh的MSII(曲率渲染,详见https://gigamesh.eu)图像数据、Phong着色三维模型、照片数据以及光照增强技术。结果表明,基于三维渲染图像进行符号检测的效果优于此前基于照片的工作。此外,本方法在仅使用照片数据时仍可获得较优结果,而最佳性能体现于混合数据集。更重要的是,Phong渲染(尤以MSII渲染为甚)能显著提升对全球最大规模照片数据集的处理性能。