While Learned Data Compression (LDC) has achieved superior compression ratios, balancing precise probability modeling with system efficiency remains challenging. Crucially, uniform single-stream architectures struggle to simultaneously capture micro-syntactic and macro-semantic features, necessitating deep serial stacking that exacerbates latency. Compounding this, heterogeneous systems are constrained by device speed mismatches, where throughput is capped by Amdahl's Law due to serial processing. To this end, we propose a Dual-Stream Multi-Scale Decoupler that disentangles local and global contexts to replace deep serial processing with shallow parallel streams, and incorporate a Hierarchical Gated Refiner for adaptive feature refinement and precise probability modeling. Furthermore, we design a Concurrent Stream-Parallel Pipeline, which overcomes systemic bottlenecks to achieve full-pipeline parallelism. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both compression ratio and throughput, while maintaining the lowest latency and memory usage. The code is available at https://github.com/huidong-ma/FADE.
翻译:尽管学习数据压缩(LDC)技术已实现卓越的压缩比,但在精确概率建模与系统效率之间取得平衡仍颇具挑战。尤为关键的是,统一的单流架构难以同时捕获微观句法与宏观语义特征,不得不依赖深度串行堆叠,进而加剧了延迟。更复杂的是,异构系统受限于设备速度不匹配,其吞吐量因串行处理而受制于阿姆达尔定律。为此,我们提出双流多尺度解耦器,通过将局部与全局上下文分离,以浅层并行流取代深层串行处理,并引入层次化门控精炼器实现自适应特征细化与精确概率建模。进一步地,我们设计了并发流并行流水线,突破系统瓶颈以实现全流水线并行。大量实验表明,本方法在压缩比与吞吐量两项指标上均达到最优性能,同时保持最低延迟与内存占用。代码已开源:https://github.com/huidong-ma/FADE。