High dynamic range (HDR) imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output. The essence is to leverage the contextual information, including both dynamic and static semantics, for better image generation. Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion. However, there is no research on jointly leveraging the dynamic and static context in a simultaneous manner. To delve into this problem, we propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules in the network. The spatial attention aims to deal with the intra-image correlation to model the dynamic motion, while the channel attention enables the inter-image intertwining to enhance the semantic consistency across frames. Aside from this, we introduce a novel realistic HDR dataset with more variations in foreground objects, environmental factors, and larger motions. Extensive comparisons on both conventional datasets and ours validate the effectiveness of our method, achieving the best trade-off on the performance and the computational cost.
翻译:高动态范围(HDR)成像旨在从多个低动态范围输入中提取信息,以生成逼真的输出。其关键在于利用上下文信息(包括动态和静态语义)来优化图像生成。现有方法通常侧重于输入帧之间由前景和/或相机运动引起的空间未对齐问题。然而,尚未有研究以并行的方式同时利用动态和静态上下文。为深入探究该问题,我们提出了一种新型无对齐网络,即语义一致Transformer(SCTNet),该网络同时包含空间注意力模块和通道注意力模块。空间注意力旨在处理图像内部相关性以建模动态运动,而通道注意力则实现图像间交织,以增强跨帧的语义一致性。此外,我们引入了一个新型逼真HDR数据集,其中包含更多前景物体、环境因素变化以及更大的运动范围。在常规数据集及我们提出的数据集上的广泛比较验证了方法的有效性,实现了性能与计算成本之间的最佳权衡。