Signal prediction is widely used in, e.g., economic forecasting, echo cancellation and in data compression, particularly in predictive coding of speech and music. Predictive coding algorithms reduce the bit-rate required for data transmission or storage by signal prediction. The prediction gain is a classic measure in applied signal coding of the quality of a predictor, as it links the mean-squared prediction error to the signal-to-quantization-noise of predictive coders. To evaluate predictor models, knowledge about the maximum achievable prediction gain independent of a predictor model is desirable. In this manuscript, Nadaraya-Watson kernel-regression (NWKR) and an information theoretic upper bound are applied to analyze the upper bound of the prediction gain on a newly recorded dataset of sustained speech/phonemes. It was found that for unvoiced speech a linear predictor always achieves the maximum prediction gain within at most 0.3 dB. On voiced speech, the optimum one-tap predictor was found to be linear but starting with two taps, the maximum achievable prediction gain was found to be about 2 dB to 6 dB above the prediction gain of the linear predictor. Significant differences between speakers/subjects were observed. The created dataset as well as the code can be obtained for research purpose upon request.
翻译:信号预测广泛应用于经济预测、回声消除和数据压缩等领域,尤其在语音和音乐的预测编码中。预测编码算法通过信号预测降低数据传输或存储所需的比特率。预测增益是应用信号编码中衡量预测器质量的经典指标,它将均方预测误差与预测编码器的信噪比联系起来。为了评估预测模型,需要了解独立于预测模型的最大可实现预测增益。本文采用Nadaraya-Watson核回归(NWKR)和信息理论上界方法,在新录制的持续语音/音素数据集上分析预测增益的上界。研究发现,对于清音语音,线性预测器始终能在最多0.3 dB范围内达到最大预测增益。对于浊音语音,最优单抽头预测器为线性预测器,但从双抽头开始,最大可实现预测增益比线性预测器的预测增益高出约2 dB至6 dB。研究观察到不同说话者/受试者之间存在显著差异。所创建的数据集及代码可应要求供研究使用。