Visual data compression is shifting from human-centered reconstruction to machine-oriented representation coding. In this setting, an image is often mapped to a compact semantic embedding, which is then compressed and transmitted for downstream inference. We propose an adaptive transform-coding method for semantic-feature compression motivated by the conditional rate-distortion function of a Gaussian mixture model. The scheme uses mode-dependent transforms and quantizers selected according to the inferred source component, enabling more efficient coding of heterogeneous feature distributions. Evaluations on features from widely used vision backbones and foundation models show that the proposed method outperforms or is competitive with state-of-the-art neural compression methods while preserving flexibility and interpretability.
翻译:视觉数据压缩正从面向人类感知的重建转向面向机器表征的编码。在此背景下,图像通常被映射为紧凑的语义嵌入,进而经过压缩与传输以支持下游推理。受高斯混合模型的条件率失真函数启发,我们提出一种适用于语义特征压缩的自适应变换编码方法。该方案根据推断出的源分量选取模式相关的变换与量化器,从而实现对异质特征分布的高效编码。在源自广泛使用的视觉骨干网络及基础模型的特征上进行评估表明,所提方法在保持灵活性与可解释性的同时,性能优于或与最先进的神经压缩方法相当。