Fisheye cameras suffer from image distortion while having a large field of view(LFOV). And this fact leads to poor performance on some fisheye vision tasks. One of the solutions is to optimize the current vision algorithm for fisheye images. However, most of the CNN-based methods and the Transformer-based methods lack the capability of leveraging distortion information efficiently. In this work, we propose a novel patch embedding method called Sector Patch Embedding(SPE), conforming to the distortion pattern of the fisheye image. Furthermore, we put forward a synthetic fisheye dataset based on the ImageNet-1K and explore the performance of several Transformer models on the dataset. The classification top-1 accuracy of ViT and PVT is improved by 0.75% and 2.8% with SPE respectively. The experiments show that the proposed sector patch embedding method can better perceive distortion and extract features on the fisheye images. Our method can be easily adopted to other Transformer-based models. Source code is at https://github.com/IN2-ViAUn/Sector-Patch-Embedding.
翻译:鱼眼相机虽然具有大视场角,但存在图像畸变问题,这导致其在某些鱼眼视觉任务中表现不佳。解决方案之一是针对鱼眼图像优化现有视觉算法。然而,大多数基于CNN和Transformer的方法难以有效利用畸变信息。本文提出一种名为扇形块嵌入(Sector Patch Embedding, SPE)的新型块嵌入方法,该方法符合鱼眼图像的畸变模式。此外,我们基于ImageNet-1K构建了合成鱼眼数据集,并探索了多种Transformer模型在该数据集上的性能。采用SPE后,ViT和PVT的分类Top-1准确率分别提升了0.75%和2.8%。实验表明,所提出的扇形块嵌入方法能更好地感知畸变并提取鱼眼图像特征。该方法可便捷地适配其他基于Transformer的模型。源代码地址:https://github.com/IN2-ViAUn/Sector-Patch-Embedding。