Simulating analogue film damage to analyse and improve artefact restoration on high-resolution scans

Digital scans of analogue photographic film typically contain artefacts such as dust and scratches. Automated removal of these is an important part of preservation and dissemination of photographs of historical and cultural importance. While state-of-the-art deep learning models have shown impressive results in general image inpainting and denoising, film artefact removal is an understudied problem. It has particularly challenging requirements, due to the complex nature of analogue damage, the high resolution of film scans, and potential ambiguities in the restoration. There are no publicly available high-quality datasets of real-world analogue film damage for training and evaluation, making quantitative studies impossible. We address the lack of ground-truth data for evaluation by collecting a dataset of 4K damaged analogue film scans paired with manually-restored versions produced by a human expert, allowing quantitative evaluation of restoration performance. We construct a larger synthetic dataset of damaged images with paired clean versions using a statistical model of artefact shape and occurrence learnt from real, heavily-damaged images. We carefully validate the realism of the simulated damage via a human perceptual study, showing that even expert users find our synthetic damage indistinguishable from real. In addition, we demonstrate that training with our synthetically damaged dataset leads to improved artefact segmentation performance when compared to previously proposed synthetic analogue damage. Finally, we use these datasets to train and analyse the performance of eight state-of-the-art image restoration methods on high-resolution scans. We compare both methods which directly perform the restoration task on scans with artefacts, and methods which require a damage mask to be provided for the inpainting of artefacts.

翻译：模拟胶片扫描常包含灰尘、划痕等伪影。针对历史与文化重要照片的保存与传播，自动化去除这些伪影至关重要。尽管当前深度学习模型在通用图像修复与去噪方面展现出显著效果，但胶片伪影移除仍是一个研究不足的问题。由于模拟损伤的复杂性、胶片扫描的高分辨率以及修复中潜在的不确定性，该任务面临尤为严峻的挑战。目前尚无公开的高质量真实模拟胶片损伤数据集用于训练与评估，导致定量研究难以开展。为解决评估中缺乏真实数据的难题，我们构建了一个包含4K损伤模拟胶片扫描及其人工专家修复版本的数据集，从而实现对修复效果的定量评估。通过从真实严重损伤图像中学习到的伪影形状与出现频率统计模型，我们构建了更大的合成损伤图像数据集（含配对清洁版本）。通过人类感知研究仔细验证仿真损伤的真实性，结果表明即便专家用户也无法区分我们的合成损伤与真实损伤。此外，实验证明，与先前提出的合成模拟损伤相比，使用我们的合成损伤数据集训练能提升伪影分割性能。最后，我们利用这些数据集训练并分析了八种前沿图像修复方法在高分辨率扫描下的表现，对比了直接对含伪影扫描执行修复任务的方法，以及需提供损伤掩码以进行伪影修复的方法。