We can estimate the size of the speakers based on their speech sounds alone. We had proposed an auditory computational theory of the Stabilised Wavelet-Mellin Transform (SWMT), which segregates information about the size and shape of the vocal tract and glottal vibration, to explain this observation. It has been shown that the auditory representation or excitation pattern (EP) associated with a weighting function based on the SWMT, termed the ``SSI weight,'' can account for the psychometric functions of size perception. In this study, we investigated whether EP with SSI weight can accurately estimate vocal tract lengths (VTLs) which were measured by magnetic resonance imaging (MRI) in male and female subjects. It was found that the use of SSI weight significantly improved the VTL estimation. Furthermore, the estimation errors in the EP with the SSI weight were significantly smaller than those in the commonly used spectra derived from the Fourier transform, Mel filterbank, and WORLD vocoder. It was also shown that the SSI weight can be easily introduced into these spectra to improve the performance.
翻译:仅凭语音信号,我们即可估计说话者的体型大小。我们曾提出一种名为稳定小波-梅林变换(SWMT)的听觉计算理论,该理论可分离声道尺寸、形状及声门振动等信息,用以解释上述现象。研究表明,基于SWMT的加权函数(即"SSI权重")所对应的听觉表示或激励模式(EP)能够解释大小感知的心理测量函数。本研究探讨了采用SSI权重的激励模式能否准确估计通过磁共振成像(MRI)测量的男性和女性实验对象的声道长度(VTL)。结果发现,使用SSI权重显著改善了VTL估计的准确性。此外,基于SSI权重的激励模式的估计误差显著小于常用的傅里叶变换、梅尔滤波器组及WORLD声码器产生的频谱。研究还表明,SSI权重可简易地引入这些频谱中以提升性能。