Deep Ensembling for Perceptual Image Quality Assessment

Blind image quality assessment is a challenging task particularly due to the unavailability of reference information. Training a deep neural network requires a large amount of training data which is not readily available for image quality. Transfer learning is usually opted to overcome this limitation and different deep architectures are used for this purpose as they learn features differently. After extensive experiments, we have designed a deep architecture containing two CNN architectures as its sub-units. Moreover, a self-collected image database BIQ2021 is proposed with 12,000 images having natural distortions. The self-collected database is subjectively scored and is used for model training and validation. It is demonstrated that synthetic distortion databases cannot provide generalization beyond the distortion types used in the database and they are not ideal candidates for general-purpose image quality assessment. Moreover, a large-scale database of 18.75 million images with synthetic distortions is used to pretrain the model and then retrain it on benchmark databases for evaluation. Experiments are conducted on six benchmark databases three of which are synthetic distortion databases (LIVE, CSIQ and TID2013) and three are natural distortion databases (LIVE Challenge Database, CID2013 and KonIQ-10 k). The proposed approach has provided a Pearson correlation coefficient of 0.8992, 0.8472 and 0.9452 subsequently and Spearman correlation coefficient of 0.8863, 0.8408 and 0.9421. Moreover, the performance is demonstrated using perceptually weighted rank correlation to indicate the perceptual superiority of the proposed approach. Multiple experiments are conducted to validate the generalization performance of the proposed model by training on different subsets of the databases and validating on the test subset of BIQ2021 database.

翻译：盲图像质量评估是一项具有挑战性的任务，尤其因缺乏参考信息。训练深度神经网络需要大量训练数据，而图像质量领域难以直接获取此类数据。为克服这一局限，通常采用迁移学习，并选用不同深度架构以获取差异化特征。经大量实验，我们设计了一种以两个CNN架构为子单元的深度网络。此外，提出了包含12,000张自然失真图像的自建图像数据库BIQ2021，该数据库经主观评分并用于模型训练与验证。实验表明，合成失真数据库无法为超出其包含失真类型的场景提供泛化能力，并非通用图像质量评估的理想选择。同时，采用包含1875万张合成失真图像的大规模数据库对模型进行预训练，随后在基准数据库上重新训练以评估性能。在六个基准数据库上开展实验，其中三个为合成失真数据库（LIVE、CSIQ、TID2013），三个为自然失真数据库（LIVE Challenge Database、CID2013、KonIQ-10k）。所提方法在后续测试中分别取得0.8992、0.8472、0.9452的皮尔逊相关系数，以及0.8863、0.8408、0.9421的斯皮尔曼相关系数。此外，通过感知加权秩相关系数证明所提方法在感知质量上的优越性。通过在不同数据库子集上训练、在BIQ2021测试子集上验证的多组实验，验证了模型的泛化性能。