As large-scale speech-to-speech models achieve high fidelity, the distinction between synthetic voices in structured environments becomes a vital area of study. This paper introduces Advosynth-500, a specialized dataset comprising 100 synthetic speech files featuring 10 unique advocate identities. Using the Speech Llama Omni model, we simulate five distinct advocate pairs engaged in courtroom arguments. We define specific vocal characteristics for each advocate and present a speaker identification challenge to evaluate the ability of modern systems to map audio files to their respective synthetic origins. Dataset is available at this link-https: //github.com/naturenurtureelite/ADVOSYNTH-500.
翻译:随着大规模语音到语音模型实现高保真度,结构化环境中合成语音的区分成为一个至关重要的研究领域。本文介绍了Advosynth-500,这是一个包含100个合成语音文件的专用数据集,涵盖10个独特的辩护人身份。利用Speech Llama Omni模型,我们模拟了五对不同的辩护人在法庭辩论中的场景。我们为每位辩护人定义了特定的声学特征,并提出了一个说话人识别挑战,以评估现代系统将音频文件映射至其各自合成来源的能力。数据集可通过此链接获取:https://github.com/naturenurtureelite/ADVOSYNTH-500。