This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to escape local optima, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and reverberation and intelligibility in clean tests, confirming its effectiveness.
翻译:本文提出PhoenixCodec,这是一个为极端低资源条件设计的综合性神经语音编码与解码框架。该系统集成了优化的非对称频时架构、循环校准与精炼训练策略以及噪声不变性微调流程。在计算量低于700 MFLOPs、延迟小于30毫秒、且需支持1 kbps与6 kbps双码率的严格约束下,现有方法面临效率与质量间的权衡。PhoenixCodec通过缓解传统解码器的资源分散问题、采用CCR策略逃离局部最优解,以及通过含噪样本微调增强鲁棒性,成功应对了这些挑战。在LRAC 2025挑战赛第一赛道中,本系统取得综合排名第三的成绩,并在真实噪声与混响环境下的1 kbps测试中表现最佳,同时在纯净语音可懂度测试中亦展现优越性能,验证了其有效性。