Recognizing whispered speech and converting it to normal speech creates many possibilities for speech interaction. Because the sound pressure of whispered speech is significantly lower than that of normal speech, it can be used as a semi-silent speech interaction in public places without being audible to others. Converting whispers to normal speech also improves the speech quality for people with speech or hearing impairments. However, conventional speech conversion techniques do not provide sufficient conversion quality or require speaker-dependent datasets consisting of pairs of whispered and normal speech utterances. To address these problems, we propose WESPER, a zero-shot, real-time whisper-to-normal speech conversion mechanism based on self-supervised learning. WESPER consists of a speech-to-unit (STU) encoder, which generates hidden speech units common to both whispered and normal speech, and a unit-to-speech (UTS) decoder, which reconstructs speech from the encoded speech units. Unlike the existing methods, this conversion is user-independent and does not require a paired dataset for whispered and normal speech. The UTS decoder can reconstruct speech in any target speaker's voice from speech units, and it requires only an unlabeled target speaker's speech data. We confirmed that the quality of the speech converted from a whisper was improved while preserving its natural prosody. Additionally, we confirmed the effectiveness of the proposed approach to perform speech reconstruction for people with speech or hearing disabilities. (project page: http://lab.rekimoto.org/projects/wesper )
翻译:识别耳语并将其转换为正常语音,为语音交互创造了诸多可能性。由于耳语声压显著低于正常语音,它可在公共场所用作不被他人察觉的半静音语音交互。将耳语转换为正常语音还能改善言语或听力障碍者的语音质量。然而,传统语音转换技术无法提供足够的转换质量,或需要由成对耳语和正常语音语句构成的说话人相关数据集。为解决这些问题,我们提出WESPER——一种基于自监督学习的零样本实时耳语转正常语音转换机制。WESPER由语音到单元编码器(可生成耳语与正常语音共有的隐藏语音单元)和单元到语音解码器(根据编码后的语音单元重建语音)组成。与现有方法不同,该转换不依赖特定用户,且无需耳语与正常语音的配对数据集。单元到语音解码器可通过语音单元以任意目标说话人的声音重建语音,仅需该目标说话人未标注的语音数据。实验证实该方法在保持自然韵律的同时提升了耳语转换语音的质量。此外,我们验证了所提方法在帮助言语或听力障碍者进行语音重建方面的有效性。(项目页面:http://lab.rekimoto.org/projects/wesper)