There are various optimization techniques in the realm of 3D, including point cloud-based approaches that use mesh, texture, and voxels which optimize how you store, and how do calculate in 3D. These techniques employ methods such as feed-forward networks, 3D convolutions, graph neural networks, transformers, and sparse tensors. However, the field of 3D is one of the most computationally expensive fields, and these methods have yet to achieve their full potential due to their large capacity, complexity, and computation limits. This paper proposes the application of knowledge distillation techniques, especially for sparse tensors in 3D deep learning, to reduce model sizes while maintaining performance. We analyze and purpose different loss functions, including standard methods and combinations of various losses, to simulate the performance of state-of-the-art models of different Sparse Convolutional NNs. Our experiments are done on the standard ScanNet V2 dataset, and we achieved around 2.6\% mIoU difference with a 4 times smaller model and around 8\% with a 16 times smaller model on the latest state-of-the-art spacio-temporal convents based models.
翻译:三维领域存在多种优化技术,包括基于点云的网格、纹理和体素处理方法,这些技术优化了三维数据的存储与计算方式。这些方法采用了前馈网络、三维卷积、图神经网络、Transformer以及稀疏张量等技术。然而,三维领域是计算成本最高的领域之一,由于现有方法模型容量大、复杂度高且受限于计算能力,尚未充分展现其潜力。本文提出将知识蒸馏技术应用于三维深度学习中的稀疏张量,在保持性能的同时缩减模型规模。我们分析并设计了多种损失函数,包括标准方法及不同损失的组合,以模拟各类先进稀疏卷积神经网络的性能表现。实验基于标准ScanNet V2数据集开展,结果表明:相较于当前最先进的时空卷积网络模型,在模型体积缩小4倍时,mIoU差异约为2.6%;当模型体积缩小16倍时,mIoU差异约为8%。