Exemplar-based video colorization is an essential technique for applications like old movie restoration. Although recent methods perform well in still scenes or scenes with regular movement, they always lack robustness in moving scenes due to their weak ability in modeling long-term dependency both spatially and temporally, leading to color fading, color discontinuity or other artifacts. To solve this problem, we propose an exemplar-based video colorization framework with long-term spatiotemporal dependency. To enhance the long-term spatial dependency, a parallelized CNN-Transformer block and a double head non-local operation are designed. The proposed CNN-Transformer block can better incorporate long-term spatial dependency with local texture and structural features, and the double head non-local operation further leverages the performance of augmented feature. While for long-term temporal dependency enhancement, we further introduce the novel linkage subnet. The linkage subnet propagate motion information across adjacent frame blocks and help to maintain temporal continuity. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively. Also, our model can generate more colorful, realistic and stabilized results, especially for scenes where objects change greatly and irregularly.
翻译:基于示例的视频着色是电影修复等应用中的核心关键技术。尽管现有方法在静态场景或规则运动场景中表现良好,但由于其在空间和时间维度上建模长期依赖关系的能力较弱,在处理运动场景时往往缺乏鲁棒性,导致出现颜色褪色、颜色不连续或其他伪影。为解决此问题,本文提出了一种具有长期时空依赖特性的基于示例的视频着色框架。为增强长期空间依赖,我们设计了并行化CNN-Transformer模块与双头非局部操作。所提出的CNN-Transformer模块能够更好地将长期空间依赖与局部纹理及结构特征相结合,而双头非局部操作则进一步提升了增强特征的性能。针对长期时间依赖增强,我们进一步引入了新颖的链接子网络。该链接子网络可在相邻帧块间传播运动信息,有助于保持时间连续性。实验结果表明,我们的模型在定量和定性评估上均优于当前最先进方法。此外,该模型能生成色彩更丰富、更真实且更稳定的结果,尤其适用于物体发生剧烈不规则变化的场景。