Multilingual speech processing requires understanding emotions, a task made difficult by limited labelled data. CLARA, minimizes reliance on labelled data, enhancing generalization across languages. It excels at fostering shared representations, aiding cross-lingual transfer of speech and emotions, even with little data. Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues. Using a large multilingual audio corpus and self-supervised learning, CLARA develops speech representations enriched with emotions, advancing emotion-aware multilingual speech processing. Our method expands the data range using data augmentation, textual embedding for visual understanding, and transfers knowledge from high- to low-resource languages. CLARA demonstrates excellent performance in emotion recognition, language comprehension, and audio benchmarks, excelling in zero-shot and few-shot learning. It adapts to low-resource languages, marking progress in multilingual speech representation learning.
翻译:多语言语音处理需要理解情感,但受限于标注数据稀缺,这一任务颇具挑战。CLARA最小化对标注数据的依赖,提升跨语言泛化能力。它擅长促进共享表征的建立,即便在数据量极少的情况下,也能辅助语音与情感的跨语言迁移。我们的方法精准捕捉语音中的情感细微差异,克服主观评估难题。通过利用大规模多语言音频语料库与自监督学习,CLARA开发出富含情感信息的语音表征,推动情感感知的多语言语音处理发展。该方法运用数据增强扩展数据范围,借助文本嵌入实现视觉理解,并将知识从高资源语言迁移至低资源语言。CLARA在情感识别、语言理解及音频基准测试中表现出色,尤其在零样本与少样本学习场景中优势显著。它能够适应低资源语言,标志着多语言语音表征学习领域的进步。