Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, Skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton sequences fails due to the inter-frame and cross-task pose similarity that makes it outstandingly hard to perceive the task correctly from a subtle context. To address this challenge, we propose Skeleton-in-Context (SiC), an effective framework for in-context skeleton sequence modeling. Our SiC is able to handle multiple skeleton-based tasks simultaneously after a single training process and accomplish each task from context according to the given prompt. It can further generalize to new, unseen tasks according to customized prompts. To facilitate context perception, we additionally propose a task-unified prompt, which adaptively learns tasks of different natures, such as partial joint-level generation, sequence-level prediction, or 2D-to-3D motion prediction. We conduct extensive experiments to evaluate the effectiveness of our SiC on multiple tasks, including motion prediction, pose estimation, joint completion, and future pose estimation. We also evaluate its generalization capability on unseen tasks such as motion-in-between. These experiments show that our model achieves state-of-the-art multi-task performance and even outperforms single-task methods on certain tasks.

翻译：上下文学习为视觉和自然语言处理中的多任务建模提供了新视角。在此设定下，模型能够从提示中感知任务并完成执行，无需任何额外的任务特定头部预测或模型微调。然而，通过上下文学习进行骨架序列建模仍未被探索。由于帧间和跨任务的姿态相似性使得从细微上下文中正确感知任务极为困难，直接应用其他领域的现有上下文模型到骨架序列上会失败。为解决这一挑战，我们提出骨架上下文（SiC），一种有效的骨架序列上下文学习框架。我们的SiC能够在单次训练过程后同时处理多个基于骨架的任务，并根据给定提示从上下文中完成每个任务。它还能进一步根据定制化提示泛化到新的、未见过的任务。为促进上下文感知，我们额外提出一种任务统一提示，能够自适应学习不同性质的任务，例如部分关节级生成、序列级预测或二维到三维运动预测。我们进行了大量实验，评估SiC在多项任务上的有效性，包括运动预测、姿态估计、关节补全和未来姿态估计。我们还评估了其在未见任务（如运动插值）上的泛化能力。这些实验表明，我们的模型实现了多任务性能的最先进水平，甚至在某些任务上超越了单任务方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日