Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. We leverage the physical characteristics of such lenses, which are analytically defined by the radial distortion profile (assumed to be known), to develop a distortion aware radial swin transformer (DarSwin). In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and a polar position encoding for radial patch merging. We validate our method on classification tasks using synthetically distorted ImageNet data and show through extensive experiments that DarSwin can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. Compared to other baselines, DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy under medium (high) distortion levels, and comparable to the state-of-the-art under low and very low distortion levels (perspective-like images).
翻译:广角镜头常被用于需要大视场的感知任务中。然而,这类镜头会产生显著畸变,使得忽略畸变效应的传统模型无法适应广角图像。本文提出一种新型基于Transformer的模型,可自动适应广角镜头产生的畸变。我们利用此类镜头的物理特性(由径向畸变分布分析确定,假设已知),开发了畸变感知径向Swin Transformer(DarSwin)。与传统基于Transformer的架构不同,DarSwin包含径向分块划分、基于畸变的标记嵌入采样技术以及用于径向分块融合的极坐标位置编码。我们通过使用合成畸变的ImageNet数据在分类任务上验证该方法,并通过大量实验证明,DarSwin能够对未见过的不同广角镜头畸变进行零样本适应。与其他基线模型相比,DarSwin在分布内数据测试中取得最佳结果(以Top-1和Top-5准确率计):在中度(剧烈)畸变水平下Top-1准确率提升近2%(6%),在低及极低畸变水平(近似透视图像)下与最先进水平相当。