There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserved.
翻译:语音转语音翻译(S2ST)领域近年来涌现出新的研究兴趣与进展,旨在将一种语言的语音表达转换为另一种语言。本研究提出多任务语音语言模型(MSLM),该模型是一种在多任务设定下训练的解码器专用语音语言模型。无需依赖文本训练数据,我们的模型能够支持保留说话人风格的多语言S2ST。