Multi-agent collaborative perception (CP) improves scene understanding by sharing information across connected agents such as autonomous vehicles, unmanned aerial vehicles, and robots. Communication bandwidth, however, constrains scalability. We present ReVQom, a learned feature codec that preserves spatial identity while compressing intermediate features. ReVQom is an end-to-end method that compresses feature dimensions via a simple bottleneck network followed by multi-stage residual vector quantization (RVQ). This allows only per-pixel code indices to be transmitted, reducing payloads from 8192 bits per pixel (bpp) of uncompressed 32-bit float features to 6-30 bpp per agent with minimal accuracy loss. On DAIR-V2X real-world CP dataset, ReVQom achieves 273x compression at 30 bpp to 1365x compression at 6 bpp. At 18 bpp (455x), ReVQom matches or outperforms raw-feature CP, and at 6-12 bpp it enables ultra-low-bandwidth operation with graceful degradation. ReVQom allows efficient and accurate multi-agent collaborative perception with a step toward practical V2X deployment.
翻译:多智能体协同感知通过共享自动驾驶车辆、无人机和机器人等互联智能体间的信息,提升场景理解能力。然而,通信带宽限制了系统的可扩展性。本文提出ReVQom,一种能够保持空间特征身份的同时压缩中间特征的编码器。ReVQom是一种端到端方法,通过简单的瓶颈网络压缩特征维度,随后进行多阶段残差向量量化。该方法仅需传输逐像素的编码索引,将未压缩32位浮点特征每像素8192比特的载荷降至每智能体6-30比特,且精度损失极小。在DAIR-V2X真实世界协同感知数据集上,ReVQom在30比特/像素时实现273倍压缩,在6比特/像素时实现1365倍压缩。在18比特/像素(455倍压缩)时,ReVQom达到或超越原始特征协同感知的性能;在6-12比特/像素时,该方法支持超低带宽运行并保持性能平稳下降。ReVQom为实现高效精准的多智能体协同感知提供了可行方案,向实际车路协同部署迈出了关键一步。