SATO: Stable Text-to-Motion Framework

Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, resulting in vastly different or even incorrect poses when presented with semantically similar or identical text inputs. In this paper, we undertake an analysis to elucidate the underlying causes of this instability, establishing a clear link between the unpredictability of model outputs and the erratic attention patterns of the text encoder module. Consequently, we introduce a formal framework aimed at addressing this issue, which we term the Stable Text-to-Motion Framework (SATO). SATO consists of three modules, each dedicated to stable attention, stable prediction, and maintaining a balance between accuracy and robustness trade-off. We present a methodology for constructing an SATO that satisfies the stability of attention and prediction. To verify the stability of the model, we introduced a new textual synonym perturbation dataset based on HumanML3D and KIT-ML. Results show that SATO is significantly more stable against synonyms and other slight perturbations while keeping its high accuracy performance.

翻译：文本到运动模型是否稳健？近期文本到运动模型的发展主要源于对特定动作的更精确预测。然而，文本模态通常仅依赖预训练的对比语言-图像预训练（CLIP）模型。我们的研究发现文本到运动模型存在一个显著问题：当输入语义相似或相同的文本时，其预测结果常表现出不一致性，导致差异极大甚至完全错误的身体姿态。本文通过分析揭示了这种不稳定性背后的根本原因，明确建立了模型输出的不可预测性与文本编码器模块注意力模式混乱之间的关联。为此，我们提出一个正式框架来解决该问题，并将其命名为稳定文本到运动框架（SATO）。SATO包含三个模块，分别负责稳定注意力、稳定预测，以及维持准确性与鲁棒性之间的平衡。我们提出了一种构建SATO的方法，使其满足注意力与预测的稳定性要求。为验证模型稳定性，我们基于HumanML3D和KIT-ML数据集引入了一个新的文本同义词扰动数据集。结果表明，SATO在保持高精度性能的同时，对同义词及其他轻微扰动具有显著更高的稳定性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日