This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to enhance optimization stability, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and reverberation and intelligibility in clean tests, confirming its effectiveness.
翻译:本文提出凤凰编解码器,一种专为极端低资源条件设计的综合性神经语音编码与解码框架。该系统集成了优化的非对称频时架构、循环校准与精炼训练策略以及噪声不变性微调流程。在计算量低于700 MFLOPs、延迟小于30毫秒、且需同时支持1 kbps与6 kbps双码率的严格约束下,现有方法往往面临效率与质量间的权衡。凤凰编解码器通过缓解传统解码器的资源分散问题、采用循环校准与精炼策略提升优化稳定性,并借助噪声样本微调增强鲁棒性,成功应对了这些挑战。在LRAC 2025挑战赛第一赛道中,本系统获得综合排名第三,并在真实噪声与混响环境测试中展现出1 kbps码率下的最优性能,同时在纯净语音可懂度测试中表现卓越,验证了其有效性。