Underwater images often exhibit poor quality, imbalanced coloration, and low contrast due to the complex and intricate interaction of light, water, and objects. Despite the significant contributions of previous underwater enhancement techniques, there exist several problems that demand further improvement: (i) Current deep learning methodologies depend on Convolutional Neural Networks (CNNs) that lack multi-scale enhancement and also have limited global perception fields. (ii) The scarcity of paired real-world underwater datasets poses a considerable challenge, and the utilization of synthetic image pairs risks overfitting. To address the aforementioned issues, this paper presents a Multi-scale Transformer-based Network called UWFormer for enhancing images at multiple frequencies via semi-supervised learning, in which we propose a Nonlinear Frequency-aware Attention mechanism and a Multi-Scale Fusion Feed-forward Network for low-frequency enhancement. Additionally, we introduce a specialized underwater semi-supervised training strategy, proposing a Subaqueous Perceptual Loss function to generate reliable pseudo labels. Experiments using full-reference and non-reference underwater benchmarks demonstrate that our method outperforms state-of-the-art methods in terms of both quantity and visual quality.
翻译:水下图像因光、水与物体之间复杂且精细的相互作用,常表现出质量差、色彩失衡及对比度低等问题。尽管现有水下增强技术已做出重要贡献,但仍存在若干有待改进的问题:(i)当前深度学习方法依赖卷积神经网络,缺乏多尺度增强能力,且全局感知域有限;(ii)配对的真实水下数据集稀缺构成重大挑战,而使用合成图像对则存在过拟合风险。针对上述问题,本文提出一种名为UWFormer的基于多尺度Transformer的网络,通过半监督学习实现多频率图像的增强,其中我们提出了一种非线性频率感知注意力机制和一种多尺度融合前馈网络用于低频增强。此外,我们引入了一种专门的水下半监督训练策略,提出一种水下感知损失函数以生成可靠的伪标签。基于全参考和无参考水下基准的实验表明,本方法在定量指标和视觉质量上均优于现有最优方法。