AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis - 专知论文

会员服务 ·

0

向量化 · Learning · 表示 · Performer · 多峰值 ·

2023 年 5 月 8 日

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

翻译：AQ-GT：一种时间对齐与量化的GRU-Transformer共语手势合成方法

Hendric Voß,Stefan Kopp

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.

翻译：生成真实且语境相关的共语手势是多模态人工智能体创作中一项具有挑战性且日益重要的任务。先前的方法侧重于学习共语手势表征与生成运动之间的直接对应关系，这虽然产生了看似自然的手势，但在人工评估中往往缺乏令人信服的效果。我们提出一种基于生成对抗网络与量化流水线的预训练部分手势序列方法。由此产生的码本向量同时作为框架的输入与输出，构成手势生成与重建的基础。通过学习潜在空间表征的映射（而非直接映射至向量表征），该框架能够生成高度真实且富有表现力的手势，这些手势紧密复现人类运动与行为，同时避免生成过程中的伪影。我们将所提方法与现有共语手势生成方法及人类行为数据集进行对比评估，并通过消融研究验证实验结果。结果表明，我们的方法显著超越当前最优水平，且部分手势结果与人类手势难以区分。我们将数据流水线与生成框架公开提供。

0

相关内容

向量化

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

86+阅读 · 2023年6月19日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

骨髓微环境调控骨髓瘤细胞RANKL表达的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Nrf2-Keap1-ARE信号通路在脊髓损伤后胶质瘢痕形成中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

MTA1抑制纺锤体组装检验点功能的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Internet环境下组合式软件的时空进程代数刻画及模型检测

国家自然科学基金

0+阅读 · 2012年12月31日

锰基层状固溶体富锂化合物的结构与功能

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

镁合金锰系磷化膜的沉积机制、界面结构和自愈合行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

Mixture Encoder for Joint Speech Separation and Recognition

Arxiv

0+阅读 · 2023年6月21日

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Arxiv

0+阅读 · 2023年6月21日

NeRF synthesis with shading guidance

Arxiv

0+阅读 · 2023年6月20日

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

Arxiv

0+阅读 · 2023年6月20日

Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging

Arxiv

0+阅读 · 2023年6月20日

Using Motif Transitions for Temporal Graph Generation

Arxiv

0+阅读 · 2023年6月19日

Forward LTLf Synthesis: DPLL At Work

Arxiv

0+阅读 · 2023年6月19日

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

Arxiv

0+阅读 · 2023年6月16日

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Arxiv

0+阅读 · 2023年6月15日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

VIP会员

文章信息

相关主题

最新内容

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

3+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

3+阅读 · 6月14日

俄乌战场地面机器人如何改写战争规则

俄乌战场地面机器人如何改写战争规则

专知会员服务

8+阅读 · 6月14日

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

专知会员服务

7+阅读 · 6月14日

《新空中力量概念：来自敏捷战斗运用的启示》2026最新50页报告

《新空中力量概念：来自敏捷战斗运用的启示》2026最新50页报告

专知会员服务

9+阅读 · 6月14日

《无人水面艇文献综述与结构设计》135页

《无人水面艇文献综述与结构设计》135页

专知会员服务

12+阅读 · 6月13日

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

《自主蜂群系统的战略架构：多域一体化、抗毁韧性及海上作战框架（2025—2035）》46页报告

专知会员服务

10+阅读 · 6月13日

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

ICML 2026｜MEMOPILOT：用强化学习训练会进化的智能体记忆

专知会员服务

2+阅读 · 6月13日

智能体时间序列系统全景综述：架构、可靠性与研究前沿

智能体时间序列系统全景综述：架构、可靠性与研究前沿

专知会员服务

11+阅读 · 6月13日

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

专知会员服务

10+阅读 · 6月12日

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

专知会员服务

28+阅读 · 6月12日

乌克兰战场背后的新武器

乌克兰战场背后的新武器

专知会员服务

8+阅读 · 6月12日

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

专知会员服务

13+阅读 · 6月12日

未来战争：乌克兰2026年反攻中的作战经验教训 - 新军事战略之“后勤封锁”（中文下载）

未来战争：乌克兰2026年反攻中的作战经验教训 - 新军事战略之“后勤封锁”（中文下载）

专知会员服务

10+阅读 · 6月12日

基于博弈论的陆军人机协同（长文报告）

基于博弈论的陆军人机协同（长文报告）

专知会员服务

13+阅读 · 6月12日

相关VIP内容

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

86+阅读 · 2023年6月19日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

俄乌战场地面机器人如何改写战争规则

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

相关论文

Mixture Encoder for Joint Speech Separation and Recognition

Arxiv

0+阅读 · 2023年6月21日

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Arxiv

0+阅读 · 2023年6月21日

NeRF synthesis with shading guidance

Arxiv

0+阅读 · 2023年6月20日

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

Arxiv

0+阅读 · 2023年6月20日

Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging

Arxiv

0+阅读 · 2023年6月20日

Using Motif Transitions for Temporal Graph Generation

Arxiv

0+阅读 · 2023年6月19日

Forward LTLf Synthesis: DPLL At Work

Arxiv

0+阅读 · 2023年6月19日

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

Arxiv

0+阅读 · 2023年6月16日

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Arxiv

0+阅读 · 2023年6月15日

Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

Arxiv

11+阅读 · 2017年12月27日

相关基金

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

骨髓微环境调控骨髓瘤细胞RANKL表达的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Nrf2-Keap1-ARE信号通路在脊髓损伤后胶质瘢痕形成中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

MTA1抑制纺锤体组装检验点功能的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Internet环境下组合式软件的时空进程代数刻画及模型检测

国家自然科学基金

0+阅读 · 2012年12月31日

锰基层状固溶体富锂化合物的结构与功能

国家自然科学基金

0+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

镁合金锰系磷化膜的沉积机制、界面结构和自愈合行为研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员