Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm

Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise. In this paper, we present an innovative video decomposition strategy that incorporates view-independent and view-dependent components to enhance the performance of LLVE. The framework is called View-aware Low-light Video Enhancement (VLLVE). We leverage dynamic cross-frame correspondences for the view-independent term (which primarily captures intrinsic appearance) and impose a scene-level continuity constraint on the view-dependent term (which mainly describes the shading condition) to achieve consistent and satisfactory decomposition results. To further ensure consistent decomposition, we introduce a dual-structure enhancement network featuring a cross-frame interaction mechanism. By supervising different frames simultaneously, this network encourages them to exhibit matching decomposition features. This mechanism can seamlessly integrate with encoder-decoder single-frame networks, incurring minimal additional parameter costs. Building upon VLLVE, we propose a more comprehensive decomposition strategy by introducing an additive residual term, resulting in VLLVE++. This residual term can simulate scene-adaptive degradations, which are difficult to model using a decomposition formulation for common scenes, thereby further enhancing the ability to capture the overall content of videos. In addition, VLLVE++ enables bidirectional learning for both enhancement and degradation-aware correspondence refinement (end-to-end manner), effectively increasing reliable correspondences while filtering out incorrect ones. Notably, VLLVE++ demonstrates strong capability in handling challenging cases, such as real-world scenes and videos with high dynamics. Extensive experiments are conducted on widely recognized LLVE benchmarks.

翻译：低光照视频增强旨在恢复因严重不可见性和噪声而受损的动态或静态场景。本文提出了一种创新的视频分解策略，该策略融合了视角无关与视角相关分量，以提升低光照视频增强的性能。该框架称为视角感知低光照视频增强。我们利用动态跨帧对应关系处理视角无关项（主要捕获内在外观），并对视角相关项（主要描述光照条件）施加场景级连续性约束，以获得一致且令人满意的分解结果。为进一步确保分解的一致性，我们引入了一种具有跨帧交互机制的双结构增强网络。通过同时监督不同帧，该网络促使它们展现出匹配的分解特征。此机制能够无缝集成到编码器-解码器单帧网络中，且仅带来极少的额外参数开销。在视角感知低光照视频增强的基础上，我们通过引入一个加性残差项，提出了一种更全面的分解策略，从而形成视角感知低光照视频增强++。该残差项能够模拟场景自适应的退化效应，这类效应在常见场景中难以通过分解公式进行建模，从而进一步增强了捕获视频整体内容的能力。此外，视角感知低光照视频增强++实现了增强与退化感知对应关系细化的双向学习（端到端方式），有效增加了可靠对应关系的同时过滤了错误对应关系。值得注意的是，视角感知低光照视频增强++在处理具有挑战性的案例（如真实世界场景和高动态视频）时展现出强大能力。我们在广泛认可的低光照视频增强基准上进行了大量实验。