Scene Depth Estimation from Traditional Oriental Landscape Paintings

Scene depth estimation from paintings can streamline the process of 3D sculpture creation so that visually impaired people appreciate the paintings with tactile sense. However, measuring depth of oriental landscape painting images is extremely challenging due to its unique method of depicting depth and poor preservation. To address the problem of scene depth estimation from oriental landscape painting images, we propose a novel framework that consists of two-step Image-to-Image translation method with CLIP-based image matching at the front end to predict the real scene image that best matches with the given oriental landscape painting image. Then, we employ a pre-trained SOTA depth estimation model for the generated real scene image. In the first step, CycleGAN converts an oriental landscape painting image into a pseudo-real scene image. We utilize CLIP to semantically match landscape photo images with an oriental landscape painting image for training CycleGAN in an unsupervised manner. Then, the pseudo-real scene image and oriental landscape painting image are fed into DiffuseIT to predict a final real scene image in the second step. Finally, we measure depth of the generated real scene image using a pre-trained depth estimation model such as MiDaS. Experimental results show that our approach performs well enough to predict real scene images corresponding to oriental landscape painting images. To the best of our knowledge, this is the first study to measure the depth of oriental landscape painting images. Our research potentially assists visually impaired people in experiencing paintings in diverse ways. We will release our code and resulting dataset.

翻译：从绘画中估计场景深度能够简化3D雕塑创作流程，使视障人士能够通过触觉欣赏画作。然而，由于东方山水画独特的深度表现手法和较差的保存状况，测量其图像深度极具挑战性。为解决东方山水画图像的场景深度估计问题，我们提出了一种新颖框架，该框架由前端的基于CLIP图像匹配的两阶段图像到图像翻译方法组成，用于预测与给定东方山水画图像最佳匹配的真实场景图像。随后，我们对生成的真实场景图像采用预训练的SOTA深度估计模型。在第一阶段，CycleGAN将东方山水画图像转换为伪真实场景图像。我们利用CLIP对风景照片图像与东方山水画图像进行语义匹配，以无监督方式训练CycleGAN。然后，将伪真实场景图像与东方山水画图像输入DiffuseIT，在第二阶段预测最终的真实场景图像。最后，我们使用MiDaS等预训练深度估计模型测量生成的真实场景图像的深度。实验结果表明，我们的方法能够充分预测与东方山水画图像对应的真实场景图像。据我们所知，这是首个测量东方山水画图像深度的研究。我们的研究有望帮助视障人士以多种方式体验画作。我们将公开代码和生成的数据集。