We can estimate the size of the speaker solely based on their speech sounds. We had proposed an auditory computational theory of the stabilised wavelet-Mellin transform (SWMT), which segregates information about the size and shape of vocal tract and glottal vibration, to explain this observation. It was demonstrated that the auditory representation or excitation pattern (EP) associated with a weighting function based on SWMT, referred to as "SSI weigh", made it possible to explain the psychometric functions of size perception. In this study, we investigated whether EP with SSI weight can precisely estimate vocal tract lengths (VTLs) which were measured using male and female MRI data. It was found that the use of SSI weight significantly improved the VTL estimation. Moreover, the estimation errors were significantly smaller in the EP with the SSI weight than those in the commonly used spectra derived from the Fourier transform, Mel filterbank, and WORLD vocoder. It was also shown that the SSI weight can be easily introduced into these spectra to improve the performance.
翻译:仅凭语音信号即可准确估计说话者的体型尺寸。我们曾提出稳定小波-梅林变换(SWMT)听觉计算理论,该理论通过分离声道尺寸/形态与声门振动信息来解释这一现象。研究表明,基于SWMT的加权函数(即"SSI权重")所构建的听觉表征或激励模式(EP)可解释尺寸感知的心理测量函数。本研究探究了带SSI权重的EP能否精确估计利用男性和女性MRI数据测得的声道长度(VTL)。实验发现,采用SSI权重显著提升了VTL估计精度。此外,与傅里叶变换、梅尔滤波器组和WORLD声码器导出的常用频谱相比,带SSI权重的EP模型估计误差显著降低。研究还表明,SSI权重可便捷地引入上述频谱中,从而提升其性能表现。