Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention

Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.

翻译：立体图像质量评估（SIQA）在评估和提升3D内容的视觉体验中起着关键作用。现有基于双目特性和注意力的SIQA方法已取得显著性能。然而，这些自下而上的方法未能充分利用人类视觉系统（HVS）的内在特性。本文提出一种基于立体注意力的新型SIQA网络，采用自上而下的视角引导质量评估过程。所提方法实现了从高层双目信号向低层单目信号的引导，并在处理流程中逐步校准双目与单目信息。我们设计了广义立体注意力（SAT）模块以实现立体感知中的自上而下理念。该模块利用融合生成的注意力图作为高层双目调制器，影响两个低层单目特征的表示。此外，我们引入能量系数（EC）以解释近期发现——灵长类初级视觉皮层的双目响应低于单目响应之和。自适应EC可灵活调节双目响应幅度，从而增强框架中鲁棒双目特征的形成。为从两路单目特征的求和与差分中提取最具判别力的质量信息，我们采用双池化策略，分别对两路特征应用最小池化和最大池化操作。实验结果表明，本自上而下方法在模拟视觉感知特性及推动SIQA领域最先进水平方面具有优越性。本工作代码见https://github.com/Fanning-Zhang/SATNet。