High dynamic range (HDR) imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output. The essence is to leverage the contextual information, including both dynamic and static semantics, for better image generation. Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion. However, there is no research on jointly leveraging the dynamic and static context in a simultaneous manner. To delve into this problem, we propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules in the network. The spatial attention aims to deal with the intra-image correlation to model the dynamic motion, while the channel attention enables the inter-image intertwining to enhance the semantic consistency across frames. Aside from this, we introduce a novel realistic HDR dataset with more variations in foreground objects, environmental factors, and larger motions. Extensive comparisons on both conventional datasets and ours validate the effectiveness of our method, achieving the best trade-off on the performance and the computational cost.
翻译:高动态范围(HDR)成像旨在从多个低动态范围输入中提取信息,生成逼真的输出。其关键在于利用上下文信息(包括动态与静态语义)以优化图像生成。现有方法通常聚焦于由前景和/或相机运动导致的输入帧之间的空间错位问题,然而,尚无研究同时联合利用动态与静态上下文。为深入探究此问题,我们提出了一种新颖的无对齐网络——语义一致性Transformer(SCTNet),其网络中同时包含空间注意力模块与通道注意力模块。空间注意力旨在处理图像内部相关性以建模动态运动,而通道注意力则实现图像间交织以增强帧间语义一致性。此外,我们还引入了一个新颖的真实场景HDR数据集,该数据集包含更多前景物体、环境因素的变化以及更大的运动。在传统数据集及我们数据集上的广泛对比验证了该方法的高效性,实现了性能与计算成本的最佳权衡。