Image colorization is a challenging problem due to multi-modal uncertainty and high ill-posedness. Directly training a deep neural network usually leads to incorrect semantic colors and low color richness. While transformer-based methods can deliver better results, they often rely on manually designed priors, suffer from poor generalization ability, and introduce color bleeding effects. To address these issues, we propose DDColor, an end-to-end method with dual decoders for image colorization. Our approach includes a pixel decoder and a query-based color decoder. The former restores the spatial resolution of the image, while the latter utilizes rich visual features to refine color queries, thus avoiding hand-crafted priors. Our two decoders work together to establish correlations between color and multi-scale semantic representations via cross-attention, significantly alleviating the color bleeding effect. Additionally, a simple yet effective colorfulness loss is introduced to enhance the color richness. Extensive experiments demonstrate that DDColor achieves superior performance to existing state-of-the-art works both quantitatively and qualitatively. The codes and models are publicly available at https://github.com/piddnad/DDColor.
翻译:图像彩色化是一个极具挑战性的问题,其难点在于多模态不确定性与高度病态性。直接训练深度神经网络通常会导致语义色彩错误和色彩丰富度不足。尽管基于Transformer的方法能取得更优结果,但它们往往依赖人工设计的先验知识,泛化能力较差,且会产生色彩溢出效应。针对这些问题,我们提出DDColor——一种基于双解码器的端到端图像彩色化方法。我们的方法包括像素解码器和基于查询的颜色解码器:前者负责恢复图像的空间分辨率,后者则利用丰富的视觉特征优化颜色查询,从而避免人工设计的先验知识。两个解码器通过交叉注意力机制协同工作,建立颜色与多尺度语义表征之间的关联,显著缓解了色彩溢出效应。此外,我们引入了一种简单而有效的色彩丰富度损失函数,以增强色彩丰富性。大量实验表明,DDColor在定量和定性指标上均优于现有最优方法。代码和模型已在https://github.com/piddnad/DDColor 开源。