We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the latest G-PCC test model (TMC13v23) under the MPEG common test conditions (CTCs), the proposed method achieved an average Bjontegaard delta bitrate reduction of 24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG Category Dense dataset. This is the first instance of a learning-based codec outperforming the G-PCC standard on these datasets under the MPEG CTCs.
翻译:我们提出了一种面向密集点云的端到端属性压缩方法。该方法结合了频率采样模块、带几何辅助的自适应尺度特征提取模块以及全局超先验熵模型。频率采样模块利用汉明窗和快速傅里叶变换提取点云的高频分量。原始点云与采样点云之间的差异被划分为多个子点云。这些子点云随后通过八叉树进行划分,为特征提取提供结构化输入。特征提取模块集成了自适应卷积层,并利用偏移注意力机制来捕获局部和全局特征。接着,一个几何辅助的属性特征细化模块用于优化提取的属性特征。最后,引入全局超先验模型进行熵编码。该模型将超先验参数从最深(基础)层传播至其他层,从而进一步提升编码效率。在解码端,使用镜像网络通过转置卷积层逐步恢复特征并重建颜色属性。所提方法以低比特率编码基础层信息,并渐进地添加增强层信息以提高重建精度。在MPEG通用测试条件下,与最新的G-PCC测试模型(TMC13v23)相比,所提方法在MPEG Category Solid数据集上对Y分量实现了平均24.58%的Bjontegaard delta比特率降低(YUV合计为21.23%),在MPEG Category Dense数据集上对Y分量实现了22.48%的降低(YUV合计为17.19%)。这是在MPEG通用测试条件下,首个基于学习的编解码器在这些数据集上性能超越G-PCC标准的实例。