This paper proposes a novel approach to generating omni-directional images from a single snapshot picture. The previous method has relied on the generative adversarial networks based on convolutional neural networks (CNN). Although this method has successfully generated omni-directional images, CNN has two drawbacks for this task. First, since a convolutional layer only processes a local area, it is difficult to propagate the information of an input snapshot picture embedded in the center of the omni-directional image to the edges of the image. Thus, the omni-directional images created by the CNN-based generator tend to have less diversity at the edges of the generated images, creating similar scene images. Second, the CNN-based model requires large video memory in graphics processing units due to the nature of the deep structure in CNN since shallow-layer networks only receives signals from a limited range of the receptive field. To solve these problems, MLPMixer-based method was proposed in this paper. The MLPMixer has been proposed as an alternative to the self-attention in the transformer, which captures long-range dependencies and contextual information. This enables to propagate information efficiently in the omni-directional image generation task. As a result, competitive performance has been achieved with reduced memory consumption and computational cost, in addition to increasing diversity of the generated omni-directional images.
翻译:本文提出了一种从单张快照图像生成全方位图像的新方法。以往方法依赖于基于卷积神经网络(CNN)的生成对抗网络。尽管该方法已成功生成全方位图像,但CNN在此任务中存在两个缺陷。首先,由于卷积层仅处理局部区域,嵌入在全方位图像中心位置的输入快照图像信息难以传播至图像边缘。因此,基于CNN的生成器生成的全方位图像在边缘区域往往多样性不足,易产生相似的场景图像。其次,受限于CNN深层结构特性(浅层网络仅接收有限感受野的信号),基于CNN的模型需要占用图形处理单元的大量显存。为解决上述问题,本文提出了基于MLPMixer的方法。MLPMixer作为Transformer中自注意力的替代方案提出,能够捕获长程依赖与上下文信息。这使得在全方位图像生成任务中信息能够高效传播。最终,该方法不仅提升了生成全方位图像的多样性,同时以更低的显存消耗与计算成本实现了具有竞争力的性能表现。