文本转语音论文 - 专知

会员服务 ·

文本转语音

文本转语音

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching

Arxiv

0+阅读 · 6月16日

Dynamic Prosody Prediction in LLM-based TTS for Improving Speaker Similarity

Arxiv

0+阅读 · 6月13日

BareWave: Waveform-Native Flow-Matching Text-to-Speech

Arxiv

0+阅读 · 6月8日

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Arxiv

0+阅读 · 6月9日

PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech

Arxiv

0+阅读 · 5月26日

CosyEdit2: Speech-Editing-Oriented Reinforcement Learning Unlocks Better Zero-Shot TTS

Arxiv

0+阅读 · 5月26日

Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability

Arxiv

0+阅读 · 5月27日

DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech

Arxiv

0+阅读 · 5月20日

The Binding Effect: Analyzing How Multi-Dimensional Cues Form Gender Bias in Instruction TTS

Arxiv

0+阅读 · 3月21日

VoXtream2: Full-stream TTS with dynamic speaking rate control

Arxiv

0+阅读 · 3月13日

CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment

Arxiv

0+阅读 · 2月23日

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems

Arxiv

0+阅读 · 3月2日

LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models

Arxiv

0+阅读 · 2月17日

Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization

Arxiv

0+阅读 · 2月6日

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Arxiv

0+阅读 · 1月19日

参考链接

微信扫码咨询专知VIP会员