GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding

Despite the significant advancements in pre-training methods for point cloud understanding, directly capturing intricate shape information from irregular point clouds without reliance on external data remains a formidable challenge. To address this problem, we propose GPSFormer, an innovative Global Perception and Local Structure Fitting-based Transformer, which learns detailed shape information from point clouds with remarkable precision. The core of GPSFormer is the Global Perception Module (GPM) and the Local Structure Fitting Convolution (LSFConv). Specifically, GPM utilizes Adaptive Deformable Graph Convolution (ADGConv) to identify short-range dependencies among similar features in the feature space and employs Multi-Head Attention (MHA) to learn long-range dependencies across all positions within the feature space, ultimately enabling flexible learning of contextual representations. Inspired by Taylor series, we design LSFConv, which learns both low-order fundamental and high-order refinement information from explicitly encoded local geometric structures. Integrating the GPM and LSFConv as fundamental components, we construct GPSFormer, a cutting-edge Transformer that effectively captures global and local structures of point clouds. Extensive experiments validate GPSFormer's effectiveness in three point cloud tasks: shape classification, part segmentation, and few-shot learning. The code of GPSFormer is available at \url{https://github.com/changshuowang/GPSFormer}.

翻译：尽管点云理解中的预训练方法已取得显著进展，但在不依赖外部数据的情况下，直接从非规则点云中捕获精细的形状信息仍然是一项艰巨的挑战。为解决此问题，我们提出了GPSFormer，一种创新的基于全局感知与局部结构拟合的Transformer，它能够以极高的精度从点云中学习详细的形状信息。GPSFormer的核心是全局感知模块（GPM）和局部结构拟合卷积（LSFConv）。具体而言，GPM利用自适应可变形图卷积（ADGConv）识别特征空间中相似特征之间的短程依赖关系，并采用多头注意力机制（MHA）学习特征空间内所有位置间的长程依赖关系，最终实现上下文表示的灵活学习。受泰勒级数启发，我们设计了LSFConv，该模块从显式编码的局部几何结构中同时学习低阶基础信息与高阶细化信息。通过将GPM和LSFConv集成为基础组件，我们构建了GPSFormer——一种能有效捕获点云全局与局部结构的先进Transformer模型。大量实验验证了GPSFormer在三个点云任务中的有效性：形状分类、部件分割和少样本学习。GPSFormer的代码公开于 \url{https://github.com/changshuowang/GPSFormer}。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日