Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence. To address the above challenge, we propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively. Firstly, we formulate low-light video enhancement as a Maximum A Posteriori estimation (MAP) problem with carefully designed spatial and temporal visual regularization. Then, via unrolling the problem, the optimization of the spatial and temporal constraints can be decomposed into different steps and updated in a stage-wise manner. From the spatial perspective, the designed Intra subnet leverages unpair prior information from expert photography retouched skills to adjust the statistical distribution. Additionally, we introduce a novel mechanism that integrates human perception feedback to guide network optimization, suppressing over/under-exposure conditions. Meanwhile, to address the issue from the temporal perspective, the designed Inter subnet fully exploits temporal cues in progressive optimization, which helps achieve improved temporal consistency in enhancement results. Consequently, the proposed method achieves superior performance to state-of-the-art methods in video illumination, noise suppression, and temporal consistency across outdoor and indoor scenes.

翻译：获取具有运动信息的低光照/正常光照视频对，比静态图像更具挑战性，这引发了技术难题，并使非配对学习技术路线成为关键角色。本文致力于在不使用配对真实值的情况下进行低光照视频增强的学习研究。与低光照图像增强相比，低光照视频增强更为困难，原因在于空间域中噪声、曝光和对比度的交织效应，以及保持时间一致性的需求。为应对上述挑战，我们提出了展开分解非配对网络（UDU-Net），通过将优化函数展开为深度网络，将信号分解为空间和时间相关因子并进行迭代更新。首先，我们将低光照视频增强建模为具有精心设计的空间和时间视觉正则化的最大后验估计问题。随后，通过问题展开，空间和时间约束的优化可分解为不同步骤，并以分阶段方式更新。从空间角度，设计的内部子网络利用来自专业摄影修图技巧的非配对先验信息来调整统计分布。此外，我们引入了一种集成人类感知反馈以指导网络优化的新机制，抑制过曝/欠曝情况。同时，为从时间角度解决问题，设计的交互子网络在渐进优化中充分利用时间线索，有助于在增强结果中实现更好的时间一致性。因此，所提方法在室内外场景的视频光照、噪声抑制和时间一致性方面均优于现有先进方法。