Convolution kernels are the basic structural component of convolutional neural networks (CNNs). In the last years there has been a growing interest in fisheye cameras for many applications. However, the radially symmetric projection model of these cameras produces high distortions that affect the performance of CNNs, especially when the field of view is very large. In this work, we tackle this problem by proposing a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion. That way, the receptive field of the convolution is similar to standard convolutions in perspective images, allowing us to take advantage of pre-trained networks in large perspective datasets. We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye with respect to standard convolutions in depth estimation and semantic segmentation.
翻译:卷积核是卷积神经网络(CNN)的基本结构组件。近年来,鱼眼相机在众多应用中受到越来越多的关注。然而,这类相机的径向对称投影模型会产生高畸变,严重影响CNN的性能,尤其是在视场角极大时。针对这一问题,本文提出一种方法:利用相机标定信息相应地变形卷积核,以适配畸变。由此,卷积的感受野与透视图像中的标准卷积相似,从而能够利用在大型透视数据集上预训练的网络。我们证明,仅需在小型数据集上进行简短微调,相比于标准卷积,该方法就能在深度估计和语义分割任务中提升网络对标定鱼眼图像的性能。