Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omnidirectional images (ODIs) rely on deeper or broader convolutional neural networks (CNNs), which benefit from CNNs' superior feature representation capabilities while suffering from their high computational costs. In this paper, inspired by the human visual cognitive process, i.e., human being's perception of a visual scene is always accomplished by multiple stages of analysis, we propose a novel multi-stage recurrent generative adversarial networks for ODIs dubbed MRGAN360, to predict the saliency maps stage by stage. At each stage, the prediction model takes as input the original image and the output of the previous stage and outputs a more accurate saliency map. We employ a recurrent neural network among adjacent prediction stages to model their correlations, and exploit a discriminator at the end of each stage to supervise the output saliency map. In addition, we share the weights among all the stages to obtain a lightweight architecture that is computationally cheap. Extensive experiments are conducted to demonstrate that our proposed model outperforms the state-of-the-art model in terms of both prediction accuracy and model size.
翻译:得益于提供沉浸式交互体验的能力,360度图像内容在消费与工业应用中的普及正快速增长。与平面二维图像相比,360度图像的显著性预测因其高分辨率与球形观看范围而更具挑战性。当前,多数高性能的全向图像显著性预测模型依赖于更深或更宽的卷积神经网络(CNN),这类模型在受益于CNN卓越特征表示能力的同时,也面临高计算成本的困境。本文受人类视觉认知过程(即人类对视觉场景的感知总是通过多阶段分析完成)启发,提出了一种新颖的多阶段循环生成对抗网络MRGAN360,用于逐阶段预测全向图像的显著性图。在每个阶段,预测模型将原始图像与上一阶段的输出作为输入,输出更精确的显著性图。我们在相邻预测阶段间采用循环神经网络建模其关联性,并在每阶段末尾利用判别器监督输出的显著性图。此外,所有阶段共享权重以构建轻量化架构,从而降低计算成本。大量实验表明,所提模型在预测精度与模型规模两方面均优于当前最优模型。