Point cloud processing methods exploit local point features and global context through aggregation which does not explicity model the internal correlations between local and global features. To address this problem, we propose full point encoding which is applicable to convolution and transformer architectures. Specifically, we propose Full Point Convolution (FPConv) and Full Point Transformer (FPTransformer) architectures. The key idea is to adaptively learn the weights from local and global geometric connections, where the connections are established through local and global correlation functions respectively. FPConv and FPTransformer simultaneously model the local and global geometric relationships as well as their internal correlations, demonstrating strong generalization ability and high performance. FPConv is incorporated in classical hierarchical network architectures to achieve local and global shape-aware learning. In FPTransformer, we introduce full point position encoding in self-attention, that hierarchically encodes each point position in the global and local receptive field. We also propose a shape aware downsampling block which takes into account the local shape and the global context. Experimental comparison to existing methods on benchmark datasets show the efficacy of FPConv and FPTransformer for semantic segmentation, object detection, classification, and normal estimation tasks. In particular, we achieve state-of-the-art semantic segmentation results of 76% mIoU on S3DIS 6-fold and 72.2% on S3DIS Area5.
翻译:点云处理方法通过聚合局部点特征与全局上下文信息,但未能显式建模局部特征与全局特征之间的内在关联。针对该问题,本文提出适用于卷积与Transformer架构的全点编码方法。具体而言,我们提出了全点卷积(FPConv)与全点Transformer(FPTransformer)两种架构。其核心思想是通过局部与全局几何连接自适应学习权重,其中连接分别通过局部与全局相关函数建立。FPConv与FPTransformer同步建模局部与全局几何关系及其内在关联,展现出强大的泛化能力与高性能表现。FPConv被嵌入经典层次化网络架构中,实现局部与全局形状感知学习。在FPTransformer中,我们引入自注意力机制的全点位置编码,在全局与局部感受野中层次化编码每个点位置信息。同时提出形状感知下采样模块,融合局部形状与全局上下文。与现有方法在基准数据集上的实验对比表明,FPConv与FPTransformer在语义分割、目标检测、分类及法向量估计任务中的有效性。特别地,我们在S3DIS六折交叉验证中实现76% mIoU的语义分割最优结果,在S3DIS Area5数据集上达到72.2% mIoU。