Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition, speaker diarization, speech synthesis, and target speaker extraction. In this article, we aim to present, from a unique perspective, the developmental history, paradigm shifts, and application domains of speaker modeling technologies within the context of deep representation learning framework. This review is designed to provide a clear reference for researchers in the speaker modeling field, as well as for those who wish to apply speaker modeling techniques to specific downstream tasks.
翻译:说话人个性信息是语音信号中最关键的元素之一。通过对该信息进行深入而精确的建模,可将其应用于多种智能语音应用,如说话人识别、说话人日志、语音合成及目标说话人提取。本文旨在从独特视角,在深度表征学习框架下,梳理说话人建模技术的发展历程、范式变迁及应用领域。本综述旨在为说话人建模领域的研究者,以及希望将说话人建模技术应用于特定下游任务的研究人员提供清晰的参考。