TwistList: Resources and Baselines for Tongue Twister Generation

Previous work in phonetically-grounded language generation has mainly focused on domains such as lyrics and poetry. In this paper, we present work on the generation of tongue twisters - a form of language that is required to be phonetically conditioned to maximise sound overlap, whilst maintaining semantic consistency with an input topic, and still being grammatically correct. We present \textbf{TwistList}, a large annotated dataset of tongue twisters, consisting of 2.1K+ human-authored examples. We additionally present several benchmark systems (referred to as TwisterMisters) for the proposed task of tongue twister generation, including models that both do and do not require training on in-domain data. We present the results of automatic and human evaluation to demonstrate the performance of existing mainstream pre-trained models in this task with limited (or no) task specific training and data, and no explicit phonetic knowledge. We find that the task of tongue twister generation is challenging for models under these conditions, yet some models are still capable of generating acceptable examples of this language type.

翻译：此前基于语音的语言生成研究主要聚焦于歌词和诗歌等领域。本文提出绕口令生成任务——这类语言形式需在语音层面进行条件化以最大化声音重叠，同时保持与输入主题的语义一致性，并确保语法正确性。我们发布了大型标注绕口令数据集 \textbf{TwistList}，包含 2100 余条人工撰写示例。此外，针对绕口令生成这一新任务，我们构建了多个基准系统（称为 TwisterMisters），涵盖需要与不需要领域数据训练两类模型。通过自动评估与人工评估，我们展示了现有主流预训练模型在该任务中（在有限或无任务特定训练数据、无显式语音知识条件下）的表现。研究发现，绕口令生成任务对当前模型具有挑战性，但部分模型仍能生成可接受的该类语言示例。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日