MS-SCANet: A Multiscale Transformer-Based Architecture with Dual Attention for No-Reference Image Quality Assessment

We present the Multi-Scale Spatial Channel Attention Network (MS-SCANet), a transformer-based architecture designed for no-reference image quality assessment (IQA). MS-SCANet features a dual-branch structure that processes images at multiple scales, effectively capturing both fine and coarse details, an improvement over traditional single-scale methods. By integrating tailored spatial and channel attention mechanisms, our model emphasizes essential features while minimizing computational complexity. A key component of MS-SCANet is its cross-branch attention mechanism, which enhances the integration of features across different scales, addressing limitations in previous approaches. We also introduce two new consistency loss functions, Cross-Branch Consistency Loss and Adaptive Pooling Consistency Loss, which maintain spatial integrity during feature scaling, outperforming conventional linear and bilinear techniques. Extensive evaluations on datasets like KonIQ-10k, LIVE, LIVE Challenge, and CSIQ show that MS-SCANet consistently surpasses state-of-the-art methods, offering a robust framework with stronger correlations with subjective human scores.

翻译：本文提出多尺度空间通道注意力网络（MS-SCANet），这是一种基于Transformer架构、专为无参考图像质量评估（IQA）设计的模型。MS-SCANet采用双分支结构处理多尺度图像，能有效捕捉精细与粗略细节，相比传统单尺度方法有所改进。通过集成定制化的空间与通道注意力机制，本模型在强调关键特征的同时降低了计算复杂度。MS-SCANet的核心组件是其跨分支注意力机制，该机制增强了不同尺度特征间的融合能力，解决了先前方法的局限性。我们还提出了两种新的一致性损失函数——跨分支一致性损失与自适应池化一致性损失，这些函数在特征缩放过程中保持空间完整性，其性能优于传统的线性与双线性技术。在KonIQ-10k、LIVE、LIVE Challenge及CSIQ等数据集上的大量实验表明，MS-SCANet持续超越现有最优方法，提供了一个与主观人工评分具有更强相关性的鲁棒框架。

相关内容

关注 0

多媒体系统（MS）期刊详细介绍了多媒体计算，通信，存储和应用的各个方面的创新研究思想，新兴技术，最新方法和工具。它包含理论，实验和调查文章。多媒体系统的覆盖范围包括：在计算机系统中集成数字视频和音频功能；多媒体信息编码和数据交换格式；数字多媒体的操作系统机制；数字视频和音频网络与通信；存储模型和结构；用于支持多媒体应用程序的方法、范式、工具和软件体系结构；多媒体应用程序和应用程序接口，以及多媒体终端系统架构。官网地址：http://dblp.uni-trier.de/db/journals/mms/

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

非Transformer不可？最新《状态空间模型（SSM）》综述

专知会员服务

75+阅读 · 2024年4月16日

【NeurIPS2023】MultiModN:多模态，多任务，可解释的模块化网络

专知会员服务

40+阅读 · 2023年9月27日

【TPAMI2023】PSLT：一种带有梯形自注意力和逐步位移的轻量级视觉Transformer

专知会员服务

26+阅读 · 2023年9月4日