SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .

翻译：在机器人系统中，可通过低功耗低成本硬件轻松采集高分辨率海量视觉数据。然而，当通过JPEG/MPEG等传统编解码器传输时，有限的带宽和边缘计算资源阻碍了数据的充分利用。AV1/AVIF等新型编解码器虽改善了率失真权衡，但编码过程需消耗更多资源，无定制ASIC时难以实际应用。近年来非对称自编码器在极端功耗和带宽约束下实现了高质量重建，但解码成本过高，且采用非标准格式导致无法利用基于JPEG等标准构建的现有基础设施。为解决上述局限，我们提出一种面向云机器人的压缩框架，其核心为传感器嵌入式自编码器与一次性转码实现高效重建（SEAOTTER）。由于传感器、云端与消费终端面临迥异的功耗和带宽预算，SEAOTTER将学习型潜变量的紧凑性与标准JPEG文件的广泛兼容性有机结合。针对简单转码会降低性能的问题，我们提出可学习的JPEG颜色与量化变换方法，可显著提升全局、密集及视觉-语言感知任务的精度。基于SEAOTTER，我们为预训练的冻结编码器训练了通用型与任务感知型转码流水线。在200:1压缩比下，与AVIF相比，编码速度提升7倍，解码速度提升3.5倍，ImageNet Top-1准确率提高8%，同时保持与JPEG基础设施的兼容性。代码已开源：https://github.com/UT-SysML/seaotter。