Phonetically-Grounded Language Generation: The Case of Tongue Twisters

Previous work in phonetically-grounded language generation has mainly focused on domains such as lyrics and poetry. In this paper, we present work on the generation of tongue twisters - a form of language that is required to be phonetically conditioned to maximise sound overlap, whilst maintaining semantic consistency with an input topic, and still being grammatically correct. We present \textbf{TwistList}, a large annotated dataset of tongue twisters, consisting of 2.1K+ human-authored examples. We additionally present several benchmark systems (referred to as TwisterMisters) for the proposed task of tongue twister generation, including models that both do and do not require training on in-domain data. We present the results of automatic and human evaluation to demonstrate the performance of existing mainstream pre-trained models in this task with limited (or no) task specific training and data, and no explicit phonetic knowledge. We find that the task of tongue twister generation is challenging for models under these conditions, yet some models are still capable of generating acceptable examples of this language type.

翻译：先前在基于语音的语言生成领域的研究主要集中在歌词和诗歌等文本形式。本文探讨了绕口令的生成任务——这种语言形式需要在维持与输入主题语义一致性的前提下，通过语音条件约束最大化语音重叠，同时保证语法正确性。我们提出了 **TwistList** 数据集，这是首个大规模带标注的绕口令数据集，包含2100余条人工撰写的示例。针对绕口令生成这一新任务，我们还构建了多个基准系统（统称 TwisterMisters），包括需要领域内数据训练和无需此类训练的模型。通过自动评估与人工评估，我们展示了现有主流预训练模型在该任务中的表现——这些模型在有限（或零）任务专项训练数据、无显式语音知识的情况下，其性能表现。研究发现，在这些约束条件下，绕口令生成对模型而言极具挑战性，但部分模型仍能生成可接受的该语言类型示例。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日