Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Xueyao Zhang,Liumeng Xue,Yicheng Gu,Yuancheng Wang,Haorui He,Chaoren Wang,Xi Chen,Zihao Fang,Haopeng Chen,Junan Zhang,Tze Ying Tang,Lexiao Zou,Mingxuan Wang,Jun Han,Kai Chen,Haizhou Li,Zhizheng Wu

from arxiv, Amphion Website: https://github.com/open-mmlab/Amphion

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. Additionally, it provides interactive visualizations and demonstrations of classic models for educational purposes. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion.

翻译：Amphion是一个用于音频、音乐与语音生成的开源工具包，旨在降低初级研究人员和工程师进入这些领域的门槛。它提供了一个统一框架，涵盖多种生成任务与模型，且易于扩展以融入新内容。该工具包设计了适合初学者的工作流程和预训练模型，使初学者和经验丰富的研究人员都能相对轻松地启动项目。此外，它还提供经典模型的交互式可视化与演示，以辅助教学。Amphion v0.1的初始版本支持文本到语音（Text to Speech, TTS）、文本到音频（Text to Audio, TTA）和歌声转换（Singing Voice Conversion, SVC）等一系列任务，并辅以数据预处理、最先进声码器和评估指标等核心组件。本文对Amphion进行了高层次概述。

相关内容

EASE

关注 0

软件工程评估（Evaluation and Assessment in Software Engineering，EASE）会议是一个国际领先的会议场所，学术界和实践者可以在此展示和讨论他们对基于证据的软件工程的研究及其对软件实践的影响。第23届EASE将于2019年4月在丹麦哥本哈根举行，由哥本哈根IT大学主办。EASE 2019欢迎向不同领域提交高质量的研究报告：完整的研究论文、短篇论文和手工艺品、新兴成果和愿景、行业轨迹、博士研讨会、海报。官网链接：https://ease2019.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日