Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.
翻译:立体图像质量评估在评估和改善3D内容的视觉体验中起着关键作用。现有基于双目特性和注意力机制的立体图像质量评估方法已取得了令人瞩目的性能。然而,这些自下而上的方法在利用人类视觉系统的内在特性方面存在不足。本文提出了一种基于立体注意力的新型网络,采用自上而下的视角引导质量评估过程。我们的方法实现了从高层双目信号到低层单目信号的引导,同时双目与单目信息可在处理流程中渐进校准。我们设计了通用的立体注意力块来实现立体感知中的自上而下理念。该模块利用融合生成的注意力图谱作为高层双目调制器,影响两个低层单目特征的表示。此外,我们引入能量系数以解释近期研究发现——灵长类初级视觉皮层中的双目响应小于单目响应之和。自适应能量系数可灵活调节双目响应的幅度,从而增强我们框架中稳健双目特征的构建。为从两个单目特征分支的求和与差分中提取最具判别性的质量信息,我们采用双池化策略,分别对两个分支应用最小值池化和最大值池化操作。实验结果突显了我们的自上而下方法在模拟视觉感知特性方面的优越性,并推动了立体图像质量评估领域的最新研究进展。本工作代码见https://github.com/Fanning-Zhang/SATNet。