OctFormer: Octree-based Transformers for 3D Point Clouds

We propose octree-based transformers, named OctFormer, for 3D point cloud learning. OctFormer can not only serve as a general and effective backbone for 3D point cloud segmentation and object detection but also have linear complexity and is scalable for large-scale point clouds. The key challenge in applying transformers to point clouds is reducing the quadratic, thus overwhelming, computation complexity of attentions. To combat this issue, several works divide point clouds into non-overlapping windows and constrain attentions in each local window. However, the point number in each window varies greatly, impeding the efficient execution on GPU. Observing that attentions are robust to the shapes of local windows, we propose a novel octree attention, which leverages sorted shuffled keys of octrees to partition point clouds into local windows containing a fixed number of points while permitting shapes of windows to change freely. And we also introduce dilated octree attention to expand the receptive field further. Our octree attention can be implemented in 10 lines of code with open-sourced libraries and runs 17 times faster than other point cloud attentions when the point number exceeds 200k. Built upon the octree attention, OctFormer can be easily scaled up and achieves state-of-the-art performances on a series of 3D segmentation and detection benchmarks, surpassing previous sparse-voxel-based CNNs and point cloud transformers in terms of both efficiency and effectiveness. Notably, on the challenging ScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 in mIoU. Our code and trained models are available at https://wang-ps.github.io/octformer.

翻译：我们提出基于八叉树的Transformer——OctFormer，用于三维点云学习。OctFormer不仅能作为通用且高效的骨干网络处理三维点云分割与目标检测任务，其线性复杂度还使其可扩展至大规模点云。将Transformer应用于点云的关键挑战在于降低注意力机制二次方、甚至过度庞大的计算复杂度。为应对此问题，若干研究将点云划分为非重叠窗口，并将注意力约束在各局部窗口内。然而，每个窗口内点数量差异巨大，阻碍了GPU上的高效执行。观察到注意力对局部窗口形状具有鲁棒性后，我们提出新型八叉树注意力机制：利用八叉树排序后的混洗键将点云划分为包含固定数量点、且形状可自由变化的局部窗口。同时引入扩张八叉树注意力以进一步扩大感受野。我们的八叉树注意力仅需10行开源库代码即可实现，当点数量超过20万时，其运行速度比其他点云注意力机制快17倍。基于八叉树注意力构建的OctFormer可轻松扩展，在多个三维分割与检测基准上达到最佳性能，在效率与效果上均超越此前基于稀疏体素的CNN和点云Transformer。值得注意的是，在极具挑战的ScanNet200数据集中，OctFormer的mIoU比基于稀疏体素的CNN高出7.3。代码与预训练模型已开源至https://wang-ps.github.io/octformer。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日