Grid sentence is commonly used for studying the Lombard effect and Normal-to-Lombard conversion. However, it's unclear if Normal-to-Lombard models trained on grid sentences are sufficient for improving natural speech intelligibility in real-world applications. This paper presents the recording of a parallel Lombard corpus (called Lombard Chinese TIMIT, LCT) extracting natural sentences from Chinese TIMIT. Then We compare natural and grid sentences in terms of Lombard effect and Normal-to-Lombard conversion using LCT and Enhanced MAndarin Lombard Grid corpus (EMALG). Through a parametric analysis of the Lombard effect, We find that as the noise level increases, both natural sentences and grid sentences exhibit similar changes in parameters, but in terms of the increase of the alpha ratio, grid sentences show a greater increase. Following a subjective intelligibility assessment across genders and Signal-to-Noise Ratios, the StarGAN model trained on EMALG consistently outperforms the model trained on LCT in terms of improving intelligibility. This superior performance may be attributed to EMALG's larger alpha ratio increase from normal to Lombard speech.
翻译:网格句常用于研究伦巴第效应以及正常到伦巴第的语音转换。然而,目前尚不清楚基于网格句训练的正常到伦巴第转换模型能否有效提升真实场景中自然语音的可懂度。本文首先录制了一个并行伦巴第语料库(命名为伦巴第汉语TIMIT,LCT),该语料库从汉语TIMIT中提取自然语句。随后,利用LCT和增强型普通话伦巴第网格语料库(EMALG),我们从伦巴第效应和正常到伦巴第转换两个维度对自然语句与网格句进行了比较。通过伦巴第效应的参数分析,我们发现随着噪声水平增加,自然语句与网格句在参数变化上呈现相似趋势,但网格句在α比率增幅方面表现得更为显著。基于跨性别与不同信噪比条件下的主观可懂度评估,采用EMALG训练的StarGAN模型在提升可懂度方面始终优于基于LCT训练的StarGAN模型。这一优越性能可能归因于EMALG在从正常语音到伦巴第语音转换过程中α比率增幅更大。