Phase vocoder-based time-stretching is a widely used technique for the time-scale modification of audio signals. However, conventional implementations suffer from ``percussion smearing,'' a well-known artifact that significantly degrades the quality of percussive components. We attribute this artifact to a fundamental time-scale mismatch between the temporally smeared magnitude spectrogram and the localized, newly generated phase. To address this, we propose SELEBI, a signal-adaptive phase vocoder algorithm that significantly reduces percussion smearing while preserving stability and the perfect reconstruction property. Unlike conventional methods that rely on heuristic processing or component separation, our approach leverages the nonstationary Gabor transform. By dynamically adapting analysis window lengths to assign short windows to intervals containing significant energy associated with percussive components, we directly compute a temporally localized magnitude spectrogram from the time-domain signal. This approach ensures greater consistency between the temporal structures of the magnitude and phase. Furthermore, the perfect reconstruction property of the nonstationary Gabor transform guarantees stable, high-fidelity signal synthesis, in contrast to previous heuristic approaches. Experimental results demonstrate that the proposed method effectively mitigates percussion smearing and yields natural sound quality.
翻译:基于相位声码器的时间伸缩是一种广泛使用的音频信号时域尺度修改技术。然而,传统实现方法存在“打击乐拖尾”这一显著降低打击乐成分质量的典型伪影。我们将此伪影归因于时间拖尾的幅度谱图与局部化新生成相位之间的根本性时域尺度失配。为解决这一问题,我们提出SELEBI算法,这是一种信号自适应的相位声码器算法,在保持稳定性和完美重构特性的同时,显著减少了打击乐拖尾现象。与传统依赖启发式处理或成分分离的方法不同,我们的方法利用非平稳Gabor变换,通过动态调整分析窗长度——对包含显著打击乐成分能量的区间分配短窗,直接从时域信号计算出时间局部化的幅度谱图。该方法确保了幅度与相位时域结构之间更高的一致性。此外,非平稳Gabor变换的完美重构特性保证了稳定、高保真的信号合成,这与先前的启发式方法形成鲜明对比。实验结果表明,所提方法能有效抑制打击乐拖尾现象,并产生自然的音质。