Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation - 专知论文

会员服务 ·

0

缩放 · Prompt · 连结 · 语言模型化 · GPT-2 ·

2023 年 2 月 7 日

Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation

翻译：利用领域文本生成扩展手语标注翻译的回译方法

Jinhui Ye,Wenxiang Jiao,Xing Wang,Zhaopeng Tu

from arxiv, Accepted at EACL 2023 (main conference)

Sign language gloss translation aims to translate the sign glosses into spoken language texts, which is challenging due to the scarcity of labeled gloss-text parallel data. Back translation (BT), which generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses, has been applied to alleviate the data scarcity problem. However, the lack of large-scale high-quality domain spoken language text data limits the effect of BT. In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data. Specifically, PGEN randomly concatenates sentences from the original in-domain spoken language text data as prompts to induce a pre-trained language model (i.e., GPT-2) to generate spoken language texts in a similar style. Experimental results on three benchmarks of sign language gloss translation in varied languages demonstrate that BT with spoken language texts generated by PGEN significantly outperforms the compared methods. In addition, as the scale of spoken language texts generated by PGEN increases, the BT technique can achieve further improvements, demonstrating the effectiveness of our approach. We release the code and data for facilitating future research in this field.

翻译：手语标注翻译旨在将手语标注序列转换为口语文本，由于缺乏标注-口语文本的平行数据，该任务面临挑战。回译（BT）通过将领域内口语文本翻译为手语标注以生成伪平行数据，已被用于缓解数据稀缺问题。然而，缺乏大规模高质量领域口语文本数据限制了回译的效果。为克服这一限制，本文提出基于提示的领域文本生成（PGEN）方法，用于生成大规模领域内口语文本数据。具体而言，PGEN将原始领域内口语文本数据中的句子随机拼接作为提示，以引导预训练语言模型（即GPT-2）生成风格相似的口语文本。在三种不同语言的手语标注翻译基准上的实验结果表明，使用PGEN生成的领域口语文本进行回译，其性能显著优于对比方法。此外，随着PGEN生成的口语文本规模增加，回译技术能够实现进一步改进，这证明了我们方法的有效性。我们已公开代码和数据以促进该领域的未来研究。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

我国HIV/AIDS患者感染人芽囊原虫的基因多态性和致病性研究

国家自然科学基金

0+阅读 · 2014年12月31日

玉米基因组功能性InDel/PAV结构变异对苗期生物质相关性状杂种优势的遗传贡献

国家自然科学基金

0+阅读 · 2013年12月31日

大豆miRNA重复基因与靶基因的共进化分析

国家自然科学基金

0+阅读 · 2013年12月31日

小麦粒重主效QTL的精细定位及候选基因克隆和功能鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

利用全基因组SNP关联分析发掘甘蓝型油菜氮高效等位基因

国家自然科学基金

0+阅读 · 2013年12月31日

综合精细定位和选择性清扫策略鉴定猪SSC1上候选区域的脂肪沉积QTN

国家自然科学基金

0+阅读 · 2011年12月31日

载铂碳化钨蒙脱石复合材料微纳结构与电催化性能关联性研究

国家自然科学基金

0+阅读 · 2011年12月31日

棉花品种资源群体产量与纤维品质性状相关基因的关联分析

国家自然科学基金

0+阅读 · 2009年12月31日

食盐电解用核壳结构金属纳米粒子/C复合催化剂的制备及其氧还原电催化机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

DDP: Diffusion Model for Dense Visual Prediction

Arxiv

0+阅读 · 2023年3月30日

Discriminative Class Tokens for Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年3月30日

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Arxiv

2+阅读 · 2023年3月29日

Multi-lingual Evaluation of Code Generation Models

Arxiv

0+阅读 · 2023年3月28日

Variational Distribution Learning for Unsupervised Text-to-Image Generation

Variational Distribution Learning for Unsupervised Text-to-Image Generation

Arxiv

0+阅读 · 2023年3月28日

fRegGAN with K-space Loss Regularization for Medical Image Translation

Arxiv

0+阅读 · 2023年3月28日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

语言模型化

最新内容

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

1+阅读 · 今天13:32

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

1+阅读 · 今天13:21

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

3+阅读 · 今天13:12

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

0+阅读 · 今天11:28

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

2+阅读 · 今天11:10

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

10+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

14+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习的自动化红队测试》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

DDP: Diffusion Model for Dense Visual Prediction

Arxiv

0+阅读 · 2023年3月30日

Discriminative Class Tokens for Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年3月30日

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Arxiv

2+阅读 · 2023年3月29日

Multi-lingual Evaluation of Code Generation Models

Arxiv

0+阅读 · 2023年3月28日

Variational Distribution Learning for Unsupervised Text-to-Image Generation

Variational Distribution Learning for Unsupervised Text-to-Image Generation

Arxiv

0+阅读 · 2023年3月28日

fRegGAN with K-space Loss Regularization for Medical Image Translation

Arxiv

0+阅读 · 2023年3月28日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

我国HIV/AIDS患者感染人芽囊原虫的基因多态性和致病性研究

国家自然科学基金

0+阅读 · 2014年12月31日

玉米基因组功能性InDel/PAV结构变异对苗期生物质相关性状杂种优势的遗传贡献

国家自然科学基金

0+阅读 · 2013年12月31日

大豆miRNA重复基因与靶基因的共进化分析

国家自然科学基金

0+阅读 · 2013年12月31日

小麦粒重主效QTL的精细定位及候选基因克隆和功能鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

利用全基因组SNP关联分析发掘甘蓝型油菜氮高效等位基因

国家自然科学基金

0+阅读 · 2013年12月31日

综合精细定位和选择性清扫策略鉴定猪SSC1上候选区域的脂肪沉积QTN

国家自然科学基金

0+阅读 · 2011年12月31日

载铂碳化钨蒙脱石复合材料微纳结构与电催化性能关联性研究

国家自然科学基金

0+阅读 · 2011年12月31日

棉花品种资源群体产量与纤维品质性状相关基因的关联分析

国家自然科学基金

0+阅读 · 2009年12月31日

食盐电解用核壳结构金属纳米粒子/C复合催化剂的制备及其氧还原电催化机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员