Morphing techniques generate artificial biometric samples that combine features from multiple individuals, allowing each contributor to be verified against a single enrolled template. While extensively studied in face recognition, this vulnerability remains largely unexplored in voice biometrics. Prior work on voice morphing is computationally expensive, non-scalable, and limited to acoustically similar identity pairs, constraining practical deployment. Moreover, existing sound-morphing methods target audio textures, music, or environmental sounds and are not transferable to voice identity manipulation. We propose VoxMorph, a zero-shot framework that produces high-fidelity voice morphs from as little as five seconds of audio per subject without model retraining. Our method disentangles vocal traits into prosody and timbre embeddings, enabling fine-grained interpolation of speaking style and identity. These embeddings are fused via Spherical Linear Interpolation (Slerp) and synthesized using an autoregressive language model coupled with a Conditional Flow Matching network. VoxMorph achieves state-of-the-art performance, delivering a 2.6x gain in audio quality, a 73% reduction in intelligibility errors, and a 67.8% morphing attack success rate on automated speaker verification systems under strict security thresholds. This work establishes a practical and scalable paradigm for voice morphing with significant implications for biometric security. The code and dataset are available on our project page: https://vcbsl.github.io/VoxMorph/
翻译:融合技术通过组合多个个体的特征生成人工生物特征样本,使得每个贡献者都能通过单一注册模板进行验证。尽管该技术在面部识别领域已得到广泛研究,但在语音生物识别中的潜在脆弱性仍很大程度上未被探索。现有的语音融合方法计算成本高昂、可扩展性差,且仅限于声学特征相似的身份对,限制了其实际部署。此外,现有的声音融合方法主要针对音频纹理、音乐或环境声音,无法迁移至语音身份操控任务。我们提出了VoxMorph,一个零样本框架,仅需每个对象五秒的音频即可生成高保真语音融合样本,且无需重新训练模型。我们的方法将语音特征解耦为韵律和音色嵌入,实现了对说话风格和身份的细粒度插值。这些嵌入通过球面线性插值(Slerp)进行融合,并利用自回归语言模型与条件流匹配网络进行合成。VoxMorph实现了最先进的性能,在严格安全阈值下,其音频质量提升了2.6倍,可懂度错误降低了73%,在自动说话人验证系统上的融合攻击成功率达到了67.8%。这项工作为语音融合建立了一个实用且可扩展的范式,对生物特征安全领域具有重要影响。代码和数据集已在项目页面发布:https://vcbsl.github.io/VoxMorph/