Deep Learning-based Compressed Domain Multimedia for Man and Machine: A Taxonomy and Application to Point Cloud Classification

In the current golden age of multimedia, human visualization is no longer the single main target, with the final consumer often being a machine which performs some processing or computer vision tasks. In both cases, deep learning plays a undamental role in extracting features from the multimedia representation data, usually producing a compressed representation referred to as latent representation. The increasing development and adoption of deep learning-based solutions in a wide area of multimedia applications have opened an exciting new vision where a common compressed multimedia representation is used for both man and machine. The main benefits of this vision are two-fold: i) improved performance for the computer vision tasks, since the effects of coding artifacts are mitigated; and ii) reduced computational complexity, since prior decoding is not required. This paper proposes the first taxonomy for designing compressed domain computer vision solutions driven by the architecture and weights compatibility with an available spatio-temporal computer vision processor. The potential of the proposed taxonomy is demonstrated for the specific case of point cloud classification by designing novel compressed domain processors using the JPEG Pleno Point Cloud Coding standard under development and adaptations of the PointGrid classifier. Experimental results show that the designed compressed domain point cloud classification solutions can significantly outperform the spatial-temporal domain classification benchmarks when applied to the decompressed data, containing coding artifacts, and even surpass their performance when applied to the original uncompressed data.

翻译：在当前多媒体黄金时代，人类视觉可视化已不再是唯一主要目标，最终消费者往往是执行处理或计算机视觉任务的机器。在这两种情况下，深度学习在从多媒体表示数据中提取特征方面发挥着基础性作用，通常会产生称为潜在表征的压缩表示。深度学习解决方案在多媒体应用广泛领域中的日益发展和应用，开辟了一个令人振奋的新愿景——使用统一的压缩多媒体表示同时服务于人类和机器。该愿景的主要优势体现在两方面：一是提升计算机视觉任务性能，因为编码伪影的影响得以缓解；二是降低计算复杂度，因为无需预先解码。本文首次提出面向压缩域计算机视觉解决方案的分类法，其设计基于与现有时空计算机视觉处理器的架构及权重兼容性。通过采用开发中的JPEG Pleno点云编码标准及对PointGrid分类器的改进，本文为点云分类这一具体案例设计了新型压缩域处理器，从而验证了所提分类法的潜力。实验结果表明，针对含编码伪影的解压数据，所设计的压缩域点云分类解决方案能显著超越时空域分类基准，甚至在对原始未压缩数据应用时表现更优。