Fusing a sequence of perfectly aligned images captured at various exposures, has shown great potential to approach High Dynamic Range (HDR) imaging by sensors with limited dynamic range. However, in the presence of large motion of scene objects or the camera, mis-alignment is almost inevitable and leads to the notorious ``ghost'' artifacts. Besides, factors such as the noise in the dark region or color saturation in the over-bright region may also fail to fill local image details to the HDR image. This paper provides a novel multi-exposure fusion model based on Swin Transformer. Particularly, we design feature selection gates, which are integrated with the feature extraction layers to detect outliers and block them from HDR image synthesis. To reconstruct the missing local details by well-aligned and properly-exposed regions, we exploit the long distance contextual dependency in the exposure-space pyramid by the self-attention mechanism. Extensive numerical and visual evaluation has been conducted on a variety of benchmark datasets. The experiments show that our model achieves the accuracy on par with current top performing multi-exposure HDR imaging models, while gaining higher efficiency.
翻译:融合一系列不同曝光下完美对齐的图像序列,已展现出利用有限动态范围传感器实现高动态范围成像的巨大潜力。然而,当场景物体或相机存在大幅运动时,错位几乎不可避免,并会导致著名的“鬼影”伪影。此外,暗部区域的噪声或过亮区域的色彩饱和等因素,也可能导致HDR图像中局部图像细节的缺失。本文提出了一种基于Swin Transformer的新型多曝光融合模型。特别地,我们设计了特征选择门控,并将其与特征提取层集成,以检测异常值并阻止其参与HDR图像合成。为了通过良好对齐和适当曝光的区域重建缺失的局部细节,我们利用自注意力机制在曝光空间金字塔中挖掘长距离上下文依赖关系。我们在多种基准数据集上进行了广泛的数值和视觉评估。实验表明,我们的模型在达到与当前顶尖多曝光HDR成像模型相当的精度的同时,获得了更高的效率。