An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.
翻译:有声书能显著提升文学作品的易读性和读者参与度,但制作、编辑和发布有声书需要耗费数百小时的人力。本文提出一种系统,可从在线电子书自动生成高质量有声书。具体而言,我们利用神经文本转语音技术的最新进展,从古腾堡计划电子书集合中创建并发布了数千本高质量且开放许可的有声书。该方法能识别结构多样的书籍集合中应朗读的适当子集,并可并行处理数百本书籍。该系统允许用户自定义有声书的语速、风格、情感语调,甚至可通过少量样本音频匹配特定嗓音。本工作贡献了五千余本开放许可有声书,并提供了交互式演示,用户可快速创建个性化有声书。有声书集访问地址:\url{https://aka.ms/audiobook}。