With the continuous improvement of device imaging resolution, the popularity of Ultra-High-Definition (UHD) images is increasing. Unfortunately, existing methods for fusing multi-exposure images in dynamic scenes are designed for low-resolution images, which makes them inefficient for generating high-quality UHD images on a resource-constrained device. To alleviate the limitations of extremely long-sequence inputs, inspired by the Large Language Model (LLM) for processing infinitely long texts, we propose a novel learning paradigm to achieve UHD multi-exposure dynamic scene image fusion on a single consumer-grade GPU, named Infinite Pixel Learning (IPL). The design of our approach comes from three key components: The first step is to slice the input sequences to relieve the pressure generated by the model processing the data stream; Second, we develop an attention cache technique, which is similar to KV cache for infinite data stream processing; Finally, we design a method for attention cache compression to alleviate the storage burden of the cache on the device. In addition, we provide a new UHD benchmark to evaluate the effectiveness of our method. Extensive experimental results show that our method maintains high-quality visual performance while fusing UHD dynamic multi-exposure images in real-time (>40fps) on a single consumer-grade GPU.
翻译:随着设备成像分辨率的持续提升,超高清(UHD)图像的普及度日益增加。然而,现有动态场景多曝光图像融合方法均针对低分辨率图像设计,导致其在资源受限设备上生成高质量超高清图像时效率低下。为缓解极长序列输入带来的限制,受大语言模型(LLM)处理无限长文本的启发,我们提出一种新颖的学习范式,可在单张消费级GPU上实现超高清多曝光动态场景图像融合,该方法命名为无限像素学习(IPL)。我们的方法设计包含三个核心组成部分:首先对输入序列进行切片处理,以缓解模型处理数据流时产生的压力;其次,我们开发了注意力缓存技术,其原理类似于处理无限数据流时的KV缓存机制;最后,我们设计了注意力缓存压缩方法,以减轻缓存对设备存储造成的负担。此外,我们提供了新的超高清基准数据集来评估方法的有效性。大量实验结果表明,我们的方法在单张消费级GPU上实时(>40fps)融合超高清动态多曝光图像的同时,能够保持高质量的视觉性能。