Stereo image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing visual properties-based methods for SIQA have achieved promising performance. However, these approaches ignore the top-down philosophy, leading to a lack of a comprehensive grasp of the human visual system (HVS) and SIQA. This paper presents a novel Stereo AttenTion Network (SATNet), which employs a top-down perspective to guide the quality assessment process. Specifically, our generalized Stereo AttenTion (SAT) structure adapts components and input/output for stereo scenarios. It leverages the fusion-generated attention map as a higher-level binocular modulator to influence two lower-level monocular features, allowing progressive recalibration of both throughout the pipeline. Additionally, we introduce an Energy Coefficient (EC) to flexibly tune the magnitude of binocular response, accounting for the fact that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in advancing the state-of-the-art in the SIQA field. The code is available at https://github.com/Fanning-Zhang/SATNet.
翻译:立体图像质量评估(SIQA)在评估和提升3D内容视觉体验中起着关键作用。现有基于视觉特性的SIQA方法已取得良好性能,但这类方法忽视了自上而下的认知理念,导致对人类视觉系统(HVS)及SIQA缺乏全面把握。本文提出一种新型立体注意力网络(SATNet),采用自上而下的视角引导质量评估过程。具体而言,我们设计的通用立体注意力(SAT)结构针对立体场景自适应调整组件与输入输出,利用融合生成的注意力图作为高级双目调制器,影响两个低级单目特征,从而在整个处理流程中逐步校准两者。此外,我们引入能量系数(EC)灵活调节双目响应的幅度,以反映灵长类初级视觉皮层中双目响应小于单目响应之和的生理特性。为从单目特征两分支的求和与差分中提取最具判别力的质量信息,我们采用双池化策略,分别对两分支应用最小池化和最大池化操作。实验结果表明,我们的自上而下方法在推动SIQA领域发展方面具有优越性。代码已在https://github.com/Fanning-Zhang/SATNet 中开源。