User-Driven Voice Generation and Editing through Latent Space Navigation - 专知论文

会员服务 ·

0

潜在 · 有向 · Processing（编程语言） · 可辨认的 · 值域 ·

2025 年 1 月 23 日

User-Driven Voice Generation and Editing through Latent Space Navigation

翻译：用户驱动的潜在空间导航语音生成与编辑方法

Yusheng Tian,Junbin Liu,Tan Lee

from arxiv, work in progress

This paper presents a user-driven approach for synthesizing specific target voices based on user feedback rather than reference recordings, which is particularly beneficial for speech-impaired individuals who want to recreate their lost voices but lack prior recordings. Our method leverages the neural analysis and synthesis framework to construct a latent speaker embedding space. Within this latent space, a human-in-the-loop search algorithm guides the voice generation process. Users participate in a series of straightforward listening-and-comparison tasks, providing feedback that iteratively refines the synthesized voice to match their desired target. Both computer simulations and real-world user studies demonstrate that the proposed approach can effectively approximate target voices. Moreover, by analyzing the mel-spectrogram generator's Jacobians, we identify a set of meaningful voice editing directions within the latent space. These directions enable users to further fine-tune specific attributes of the generated voice, including the pitch level, pitch range, volume, vocal tension, nasality, and tone color.

翻译：本文提出了一种基于用户反馈而非参考录音的特定目标语音合成方法，该方法对于希望重建已丧失嗓音但缺乏历史录音的言语障碍者尤为有益。我们的方法利用神经分析与合成框架构建潜在说话人嵌入空间。在该潜在空间中，采用人机协同搜索算法引导语音生成过程。用户通过参与一系列简单的听音比较任务，提供反馈以迭代优化合成语音，使其逐步逼近期望目标。计算机仿真与真实用户研究均表明，所提方法能有效逼近目标语音。此外，通过分析梅尔频谱生成器的雅可比矩阵，我们在潜在空间中识别出一组具有语义意义的语音编辑方向。这些方向使用户能够进一步微调生成语音的特定属性，包括基频水平、基频范围、音量、声带紧张度、鼻音度与音色。

0

相关内容

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Whole-Body Dynamic Throwing with Legged Manipulators

Arxiv

0+阅读 · 2025年4月1日

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Arxiv

0+阅读 · 2025年3月31日

Concept Navigation and Classification via Open-Source Large Language Model Processing

Concept Navigation and Classification via Open-Source Large Language Model Processing

Arxiv

0+阅读 · 2025年3月31日

Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors

Arxiv

0+阅读 · 2025年3月30日

Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Arxiv

0+阅读 · 2025年3月28日

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Arxiv

0+阅读 · 2025年3月28日

Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation

Arxiv

15+阅读 · 2021年1月21日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

《美陆军条例：陆军指挥政策（2026版）》

《美陆军条例：陆军指挥政策（2026版）》

专知会员服务

6+阅读 · 今天8:10

《提升美军全域城市作战训练最佳实践的案例研究》366页

《提升美军全域城市作战训练最佳实践的案例研究》366页

专知会员服务

7+阅读 · 今天8:06

《军用自主人工智能系统的治理与安全》

《军用自主人工智能系统的治理与安全》

专知会员服务

5+阅读 · 今天8:02

美海军数字作战负责人：如何利用数据快速生成战斗力

美海军数字作战负责人：如何利用数据快速生成战斗力

专知会员服务

4+阅读 · 今天7:32

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

专知会员服务

11+阅读 · 4月20日

《系统簇式多域作战规划范畴论框架》

《系统簇式多域作战规划范畴论框架》

专知会员服务

8+阅读 · 4月20日

《美国防部指令6130.03，第2卷服役医疗标准：保留》

《美国防部指令6130.03，第2卷服役医疗标准：保留》

专知会员服务

6+阅读 · 4月20日

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

专知会员服务

4+阅读 · 4月20日

美空军“战场机载通信节点（BACN）”：美以对伊空战行动中隐形却关键的一环

美空军“战场机载通信节点（BACN）”：美以对伊空战行动中隐形却关键的一环

专知会员服务

8+阅读 · 4月20日

【CMU博士论文】面向非结构化环境下医疗急救的具身人工智能

【CMU博士论文】面向非结构化环境下医疗急救的具身人工智能

专知会员服务

4+阅读 · 4月20日

高效视频扩散模型：进展与挑战

高效视频扩散模型：进展与挑战

专知会员服务

4+阅读 · 4月20日

乌克兰前线的五项创新

乌克兰前线的五项创新

专知会员服务

8+阅读 · 4月20日

军事通信系统与设备的技术演进综述

军事通信系统与设备的技术演进综述

专知会员服务

7+阅读 · 4月20日

《北约 AI手册：作战人员的实用考量》（2026最新64页）

《北约 AI手册：作战人员的实用考量》（2026最新64页）

专知会员服务

12+阅读 · 4月20日

《北约标准：医疗评估手册》174页

《北约标准：医疗评估手册》174页

专知会员服务

6+阅读 · 4月20日

相关VIP内容

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《提升美军全域城市作战训练最佳实践的案例研究》366页

美海军数字作战负责人：如何利用数据快速生成战斗力

《美陆军条例：陆军指挥政策（2026版）》

《军用自主人工智能系统的治理与安全》

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

Whole-Body Dynamic Throwing with Legged Manipulators

Arxiv

0+阅读 · 2025年4月1日

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Arxiv

0+阅读 · 2025年3月31日

Concept Navigation and Classification via Open-Source Large Language Model Processing

Concept Navigation and Classification via Open-Source Large Language Model Processing

Arxiv

0+阅读 · 2025年3月31日

Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors

Arxiv

0+阅读 · 2025年3月30日

Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Arxiv

0+阅读 · 2025年3月28日

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Arxiv

0+阅读 · 2025年3月28日

Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation

Arxiv

15+阅读 · 2021年1月21日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员