Data availability is crucial for advancing artificial intelligence applications, including voice-based technologies. As content creation, particularly in social media, experiences increasing demand, translation and text-to-speech (TTS) technologies have become essential tools. Notably, the performance of these TTS technologies is highly dependent on the quality of the training data, emphasizing the mutual dependence of data availability and technological progress. This paper introduces an end-to-end tool to generate high-quality datasets for text-to-speech (TTS) models to address this critical need for high-quality data. The contributions of this work are manifold and include: the integration of language-specific phoneme distribution into sample selection, automation of the recording process, automated and human-in-the-loop quality assurance of recordings, and processing of recordings to meet specified formats. The proposed application aims to streamline the dataset creation process for TTS models through these features, thereby facilitating advancements in voice-based technologies.
翻译:数据可用性对于推动包括语音技术在内的人工智能应用至关重要。随着内容创作(尤其是在社交媒体领域)需求的日益增长,翻译和文本转语音(TTS)技术已成为不可或缺的工具。值得注意的是,这些TTS技术的性能高度依赖于训练数据的质量,这突显了数据可用性与技术进步之间的相互依赖关系。本文介绍了一种端到端工具,用于为文本转语音(TTS)模型生成高质量数据集,以应对对高质量数据的关键需求。本研究的多方面贡献包括:将语言特定的音素分布融入样本选择、录音过程的自动化、录音的自动与人在环路质量保证,以及将录音处理为指定格式。所提出的应用程序旨在通过这些特性简化TTS模型的数据集创建流程,从而推动语音技术的进步。