Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication. With the current discoveries on representation learning and the development of the speech synthesis technologies, direct translation of brain signals into speech has shown great promise. Especially, the processed input features and neural speech embeddings which are given to the neural network play a significant role in the overall performance when using deep generative models for speech generation from brain signals. In this paper, we introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals, which may ultimately facilitate innovation in non-verbal communication. Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech, which may play a significant role in the speech synthesis works.
翻译:脑到语音技术代表了人工智能、脑机接口和语音合成等跨学科应用的融合。基于神经表征学习的意图解码与语音合成将神经活动直接与人类语言交流方式联系起来,有望极大提升沟通的自然性。借助当前表征学习的最新发现及语音合成技术的发展,从脑信号直接翻译为语音已展现出巨大潜力。特别地,当使用深度生成模型从脑信号生成语音时,输入神经网络的处理后特征和神经语音嵌入对整体性能具有关键影响。本文介绍当前脑到语音技术及其从脑信号合成语音的可能性,这最终将推动非语言交流领域的创新。同时,我们对执行语音任务时神经生理激活背后的神经特征及神经语音嵌入进行综合分析,这些因素在语音合成研究中可能发挥重要作用。