FBPT: A Fully Binary Point Transformer

This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices. By compressing the weights and activations of a 32-bit full-precision network to 1-bit binary values, the proposed binary point cloud Transformer network significantly reduces the storage footprint and computational resource requirements of neural network models for point cloud processing tasks, compared to full-precision point cloud networks. However, achieving a fully binary point cloud Transformer network, where all parts except the modules specific to the task are binary, poses challenges and bottlenecks in quantizing the activations of Q, K, V and self-attention in the attention module, as they do not adhere to simple probability distributions and can vary with input data. Furthermore, in our network, the binary attention module undergoes a degradation of the self-attention module due to the uniform distribution that occurs after the softmax operation. The primary focus of this paper is on addressing the performance degradation issue caused by the use of binary point cloud Transformer modules. We propose a novel binarization mechanism called dynamic-static hybridization. Specifically, our approach combines static binarization of the overall network model with fine granularity dynamic binarization of data-sensitive components. Furthermore, we make use of a novel hierarchical training scheme to obtain the optimal model and binarization parameters. These above improvements allow the proposed binarization method to outperform binarization methods applied to convolution neural networks when used in point cloud Transformer structures. To demonstrate the superiority of our algorithm, we conducted experiments on two different tasks: point cloud classification and place recognition.

翻译：本文提出了一种新颖的全二值点云Transformer（FBPT）模型，该模型有望在机器人和移动设备领域得到广泛应用与拓展。通过将32位全精度网络的权重和激活值压缩为1位二值值，与全精度点云网络相比，所提出的二值点云Transformer网络显著降低了点云处理任务中神经网络模型的存储占用和计算资源需求。然而，实现全二值点云Transformer网络（即除任务特定模块外的所有部分均为二值）面临着挑战和瓶颈，因为需要对注意力模块中的Q、K、V和自注意力机制的激活值进行量化，而这些激活值不遵循简单的概率分布，并可能随输入数据变化。此外，在我们的网络中，二值化注意力模块由于softmax运算后产生的均匀分布，导致自注意力模块性能下降。本文的主要关注点在于解决因使用二值点云Transformer模块而导致的性能退化问题。我们提出了一种新颖的二值化机制，称为动态-静态混合方法。具体而言，我们的方法将整体网络模型的静态二值化与数据敏感组件的细粒度动态二值化相结合。此外，我们利用一种新颖的分层训练方案来获得最优模型和二值化参数。上述改进使得所提出的二值化方法在点云Transformer结构中应用时，优于应用于卷积神经网络的二值化方法。为证明我们算法的优越性，我们在点云分类和位置识别这两项不同任务上进行了实验。