This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are deficient in some aspects. In contrast, HQ-50K considers all of these five aspects during the data curation process and meets all requirements. We also present a new Degradation-Aware Mixture of Expert (DAMoE) model, which enables a single model to handle multiple corruption types and unknown levels. Our extensive experiments demonstrate that HQ-50K consistently improves the performance on various image restoration tasks, such as super-resolution, denoising, dejpeg, and deraining. Furthermore, our proposed DAMoE, trained on our \dataset, outperforms existing state-of-the-art unified models designed for multiple restoration tasks and levels. The dataset and code are available at \url{https://github.com/littleYaang/HQ-50K}.
翻译:本文提出了一种名为HQ-50K的新型大规模图像复原数据集,其中包含50,000张具有丰富纹理细节和语义多样性的高质量图像。我们从数据规模、分辨率、压缩率、纹理细节和语义覆盖五个维度对现有图像复原数据集进行了分析,发现这些数据集在某些方面均存在不足。相比之下,HQ-50K在数据构建过程中综合考量了上述五个维度,并满足所有需求。此外,我们还提出了一种新型退化感知混合专家模型(DAMoE),使单一模型能够处理多种退化类型及未知退化程度。大量实验表明,HQ-50K能够持续提升多种图像复原任务(如超分辨率、去噪、去JPEG伪影和去雨)的性能。更重要的是,基于本数据集训练的DAMoE模型,在性能上超越了现有针对多任务及多退化程度设计的统一模型。数据集与代码已开源至\url{https://github.com/littleYaang/HQ-50K}。