Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, Skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton sequences fails due to the inter-frame and cross-task pose similarity that makes it outstandingly hard to perceive the task correctly from a subtle context. To address this challenge, we propose Skeleton-in-Context (SiC), an effective framework for in-context skeleton sequence modeling. Our SiC is able to handle multiple skeleton-based tasks simultaneously after a single training process and accomplish each task from context according to the given prompt. It can further generalize to new, unseen tasks according to customized prompts. To facilitate context perception, we additionally propose a task-unified prompt, which adaptively learns tasks of different natures, such as partial joint-level generation, sequence-level prediction, or 2D-to-3D motion prediction. We conduct extensive experiments to evaluate the effectiveness of our SiC on multiple tasks, including motion prediction, pose estimation, joint completion, and future pose estimation. We also evaluate its generalization capability on unseen tasks such as motion-in-between. These experiments show that our model achieves state-of-the-art multi-task performance and even outperforms single-task methods on certain tasks.

翻译：上下文学习为视觉与自然语言处理领域的多任务建模提供了新视角。在此范式下，模型能够通过提示感知任务并完成之，无需任何额外的任务特定头部预测或模型微调。然而，基于上下文学习的骨架序列建模仍属空白。直接将其他领域的现有上下文模型应用于骨架序列会遭遇失败，原因在于帧间与跨任务的姿态相似性使得从细微上下文中准确感知任务变得极为困难。为应对这一挑战，我们提出骨架上下文（SiC）——一种用于上下文骨架序列建模的有效框架。我们的SiC能够在单次训练后同时处理多个基于骨架的任务，并根据给定提示从上下文中完成各项任务。它还能根据定制化提示进一步泛化至新的、未见过的任务。为促进上下文感知，我们额外提出了一种任务统一提示，其能自适应学习不同性质的任务，例如部分关节级生成、序列级预测或二维到三维运动预测。我们通过大量实验评估SiC在多项任务上的有效性，包括运动预测、姿态估计、关节补全及未来姿态估计。同时评估了其在未见任务（如中间运动生成）上的泛化能力。实验表明，我们的模型实现了最先进的多任务性能，甚至在部分任务上超越了单任务方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日