pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation

LiDAR-generated point clouds are crucial for perceiving outdoor environments. The segmentation of point clouds is also essential for many applications. Previous research has focused on using self-attention and convolution (local attention) mechanisms individually in semantic segmentation architectures. However, there is limited work on combining the learned representations of these attention mechanisms to improve performance. Additionally, existing research that combines convolution with self-attention relies on global attention, which is not practical for processing large point clouds. To address these challenges, this study proposes a new architecture, pCTFusion, which combines kernel-based convolutions and self-attention mechanisms for better feature learning and capturing local and global dependencies in segmentation. The proposed architecture employs two types of self-attention mechanisms, local and global, based on the hierarchical positions of the encoder blocks. Furthermore, the existing loss functions do not consider the semantic and position-wise importance of the points, resulting in reduced accuracy, particularly at sharp class boundaries. To overcome this, the study models a novel attention-based loss function called Pointwise Geometric Anisotropy (PGA), which assigns weights based on the semantic distribution of points in a neighborhood. The proposed architecture is evaluated on SemanticKITTI outdoor dataset and showed a 5-7% improvement in performance compared to the state-of-the-art architectures. The results are particularly encouraging for minor classes, often misclassified due to class imbalance, lack of space, and neighbor-aware feature encoding. These developed methods can be leveraged for the segmentation of complex datasets and can drive real-world applications of LiDAR point cloud.

翻译：LiDAR生成的点云对于感知户外环境至关重要，而点云分割也是许多应用中的关键环节。既有研究主要聚焦于在语义分割架构中单独使用自注意力机制和卷积（局部注意力）机制，但关于如何结合这两种注意力机制的学习表征以提升性能的工作仍较为有限。此外，现有将卷积与自注意力结合的研究依赖于全局注意力机制，这在处理大规模点云时并不实用。为应对这些挑战，本研究提出了一种新型架构pCTFusion，它融合了基于核的卷积与自注意力机制，以增强特征学习能力并捕捉分割任务中的局部与全局依赖关系。所提出的架构根据编码器块的层级位置分别采用局部和全局两种自注意力机制。同时，现有损失函数未考虑点的语义与位置重要性，导致在尖锐类别边界处精度下降。为此，本研究建模了一种新型基于注意力的损失函数——点式几何各向异性（Pointwise Geometric Anisotropy, PGA），该函数根据邻域内点的语义分布赋予权重。在SemanticKITTI户外数据集上的评估表明，所提架构相比当前最优方法性能提升了5-7%，尤其对因类别不平衡、空间不足及邻域感知特征编码而常被误分类的次要类别，取得了显著改进效果。这些方法可用于复杂数据集的分割，并推动LiDAR点云的实际应用。