Multi-Scale Estimation for Omni-Directional Saliency Maps Using Learnable Equator Bias

from arxiv, Accepted for publication in IEICE Transactions on Information and Systems, Vol. E106-D, No. 10, 2023. https://www.jstage.jst.go.jp/browse/transinf The code is available at https://github.com/islab-sophia/odisal

Omni-directional images have been used in wide range of applications. For the applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.

翻译：全向图像已被广泛应用于多种场景。在此类应用中，通过头戴式显示器估计表示注视点概率分布的显著性图，对于检测全向图像中的重要区域具有重要意义。本文提出一种新型全向图像显著性图估计模型，通过从全向图像中提取不同方向和视角的重叠二维平面图像实现。虽然二维显著性图倾向于在图像中心呈现高概率（中心偏置），但在使用头戴式显示器时，全向显著性图的高概率区域出现在水平方向（赤道偏置）。因此，本文通过将中心偏置层替换为基于二维平面图像提取仰角条件的赤道偏置层，利用全向数据集对带有中心偏置层的二维显著性模型进行微调。通过采用经大量二维显著性图真值训练图像预训练的成熟二维显著性模型，可有效弥补全向图像在显著性数据集中的有限可用性。此外，本文提出多尺度估计方法，通过提取多个视角的二维图像来检测不同尺度的目标，并通过可变感受野实现。基于像素级注意力权重，在集成层中计算各视角估计的显著性图的加权系数，以对每个目标赋予最优尺度。使用公开数据集和全向显著性图评估指标对方法进行验证，结果表明所提方法显著提升了显著性图的准确度。