We present the Multi-Scale Spatial Channel Attention Network (MS-SCANet), a transformer-based architecture designed for no-reference image quality assessment (IQA). MS-SCANet features a dual-branch structure that processes images at multiple scales, effectively capturing both fine and coarse details, an improvement over traditional single-scale methods. By integrating tailored spatial and channel attention mechanisms, our model emphasizes essential features while minimizing computational complexity. A key component of MS-SCANet is its cross-branch attention mechanism, which enhances the integration of features across different scales, addressing limitations in previous approaches. We also introduce two new consistency loss functions, Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss, which maintain spatial integrity during feature scaling, outperforming conventional linear and bilinear techniques. Extensive evaluations on datasets like KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show that MS-SCANet consistently surpasses state-of-the-art methods, offering a robust framework with stronger correlations with subjective human scores.
翻译:本文提出多尺度空间通道注意力网络(MS-SCANet),这是一种基于Transformer架构、专为无参考图像质量评估(IQA)设计的模型。MS-SCANet采用双分支结构处理多尺度图像,能有效捕捉精细与粗略细节,相比传统单尺度方法有所改进。通过集成定制化的空间与通道注意力机制,本模型在强调关键特征的同时降低了计算复杂度。MS-SCANet的核心组件是其跨分支注意力机制,该机制增强了不同尺度特征间的融合能力,解决了先前方法的局限性。我们还提出了两种新的一致性损失函数——跨分支一致性损失与自适应池化一致性损失,这些函数在特征缩放过程中保持空间完整性,其性能优于传统的线性与双线性技术。在KonIQ-10k、LIVE、LIVE Challenge及CSIQ等数据集上的大量实验表明,MS-SCANet持续超越现有最优方法,提供了一个与主观人工评分具有更强相关性的鲁棒框架。