Image colorization is a challenging problem due to multi-modal uncertainty and high ill-posedness. Directly training a deep neural network usually leads to incorrect semantic colors and low color richness. While transformer-based methods can deliver better results, they often rely on manually designed priors, suffer from poor generalization ability, and introduce color bleeding effects. To address these issues, we propose DDColor, an end-to-end method with dual decoders for image colorization. Our approach includes a pixel decoder and a query-based color decoder. The former restores the spatial resolution of the image, while the latter utilizes rich visual features to refine color queries, thus avoiding hand-crafted priors. Our two decoders work together to establish correlations between color and multi-scale semantic representations via cross-attention, significantly alleviating the color bleeding effect. Additionally, a simple yet effective colorfulness loss is introduced to enhance the color richness. Extensive experiments demonstrate that DDColor achieves superior performance to existing state-of-the-art works both quantitatively and qualitatively. The codes and models are publicly available at https://github.com/piddnad/DDColor.
翻译:图像彩色化是一个具有挑战性的问题,原因在于多模态不确定性和高度的病态性。直接训练深度神经网络通常会导致语义颜色错误及色彩丰富度不足。尽管基于Transformer的方法能取得更好效果,但往往依赖于手工设计的先验知识,存在泛化能力差、易产生色彩渗色效应等问题。为解决上述问题,我们提出DDColor——一种采用双解码器的端到端图像彩色化方法。该方法包含像素解码器和基于查询的颜色解码器:前者恢复图像空间分辨率,后者利用丰富的视觉特征优化颜色查询,从而避免手工先验。两个解码器通过交叉注意力机制协同工作,建立颜色与多尺度语义表征之间的关联,显著减轻色彩渗色效应。此外,我们引入一种简洁有效的色彩丰富度损失函数以增强色彩丰富度。大量实验表明,DDColor在定量与定性指标上均优于现有最先进方法。代码与模型已开源至https://github.com/piddnad/DDColor。