In recent years, audio coding technology has been standardized based on several frameworks that incorporate linear predictive coding (LPC). However, coding the transient signal using frequency-domain LP residual signals remains a challenge. To address this, temporal noise shaping (TNS) can be adapted, although it cannot be effectively operated since the estimated temporal envelope in the modified discrete cosine transform (MDCT) domain is accompanied by the time-domain aliasing (TDA) terms. In this study, we propose the modulated complex lapped transform-based coding framework integrated with transform coded excitation (TCX) and complex LPC-based TNS (CTNS). Our approach uses a 50\% overlap window and switching scheme for the CTNS to improve the coding efficiency. Additionally, an adaptive calculation of the target bits for the sub-bands using the frequency envelope information based on the quantized LPC coefficients is proposed. To minimize the quantization mismatch between both modes, an integrated quantization for real and complex values and a TDA augmentation method that compensates for the artificially generated TDA components during switching operations are proposed. The proposed coding framework shows a superior performance in both objective metrics and subjective listening tests, thereby demonstrating its low bit-rate audio coding.
翻译:近年来,基于线性预测编码(LPC)的多种框架已实现音频编码技术的标准化。然而,利用频域LPC残差信号对瞬态信号进行编码仍是一项挑战。针对该问题,可采用时域噪声整形(TNS)技术,但由于修正离散余弦变换(MDCT)域中估计的时间包络伴随时域混叠(TDA)项,该技术无法有效运行。本研究提出了一种基于调制复重叠变换的编码框架,该框架融合了变换编码激励(TCX)和基于复LPC的时域噪声整形(CTNS)。该方法采用50%重叠窗口及CTNS切换方案以提升编码效率。此外,还提出了一种利用基于量化LPC系数的频域包络信息自适应计算子带目标比特的方法。为最小化两种模式间的量化失配,本研究引入了实数值与复数值的联合量化,以及一种补偿切换操作中人工生成TDA分量的TDA增强方法。所提出的编码框架在客观指标和主观听觉测试中均展现出优越性能,从而验证了其低比特率音频编码的有效性。