The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel \textbf{P}seudo Multi-\textbf{P}erspective \textbf{T}rans\textbf{former} network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Decomposition, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.
翻译:随着无人机(UAV)在各领域的广泛应用,高效的无人机图像分割变得至关重要,但由于无人机拍摄图像的动态视角特性,该任务面临诸多挑战。传统分割算法因无法准确模拟无人机视角的复杂性而效果受限,且获取多视角标注数据集的成本极高。为解决这些问题,本文提出了PPTFormer,这是一种创新的**伪多视角Transformer**网络,为无人机图像分割带来了革命性改进。该方法通过生成伪视角来增强多视角学习,从而避免了对真实多视角数据的依赖。PPTFormer网络包含视角分解模块、新颖的视角原型设计以及专有的编码器-解码器结构,通过伪多视角注意力(PMP Attention)与融合机制协同工作,实现了卓越的分割性能。实验表明,PPTFormer在五个无人机分割数据集上均达到了最先进的性能,证实了其有效模拟无人机飞行视角并显著提升分割精度的能力。本研究在无人机场景理解领域实现了突破性进展,为语义分割的未来发展设立了新的基准。