上下文中的动力学挖掘：基于文本到运动蒸馏的小样本动作合成 (Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation)

The acquisition cost for large, annotated motion datasets remains a critical bottleneck for skeletal-based Human Activity Recognition (HAR). Although Text-to-Motion (T2M) generative models offer a compelling, scalable source of synthetic data, their training objectives, which emphasize general artistic motion, and dataset structures fundamentally differ from HAR's requirements for kinematically precise, class-discriminative actions. This disparity creates a significant domain gap, making generalist T2M models ill-equipped for generating motions suitable for HAR classifiers. To address this challenge, we propose KineMIC (Kinetic Mining In Context), a transfer learning framework for few-shot action synthesis. KineMIC adapts a T2M diffusion model to an HAR domain by hypothesizing that semantic correspondences in the text encoding space can provide soft supervision for kinematic distillation. We operationalize this via a kinetic mining strategy that leverages CLIP text embeddings to establish correspondences between sparse HAR labels and T2M source data. This process guides fine-tuning, transforming the generalist T2M backbone into a specialized few-shot Action-to-Motion generator. We validate KineMIC using HumanML3D as the source T2M dataset and a subset of NTU RGB+D 120 as the target HAR domain, randomly selecting just 10 samples per action class. Our approach generates significantly more coherent motions, providing a robust data augmentation source that delivers a +23.1% accuracy points improvement. Animated illustrations and supplementary materials are available at https://lucazzola.github.io/publications/kinemic.

翻译：获取大规模标注运动数据集的成本仍然是基于骨架的人体活动识别（HAR）的关键瓶颈。尽管文本到运动（T2M）生成模型提供了一个引人注目且可扩展的合成数据来源，但其训练目标（强调通用艺术性运动）和数据集结构从根本上不同于HAR对运动学精确、类别区分性动作的需求。这种差异造成了显著的领域鸿沟，使得通用型T2M模型难以生成适用于HAR分类器的运动数据。为应对这一挑战，我们提出了KineMIC（上下文中的动力学挖掘），一种用于小样本动作合成的迁移学习框架。KineMIC通过假设文本编码空间中的语义对应关系可为运动学蒸馏提供软监督，从而将T2M扩散模型适配到HAR领域。我们通过一种动力学挖掘策略实现这一目标，该策略利用CLIP文本嵌入在稀疏的HAR标签与T2M源数据之间建立对应关系。这一过程指导微调，将通用型T2M主干网络转变为专门的小样本动作到运动生成器。我们使用HumanML3D作为源T2M数据集，NTU RGB+D 120的一个子集作为目标HAR领域来验证KineMIC，每个动作类别仅随机选取10个样本。我们的方法生成了显著更连贯的运动序列，提供了一个强大的数据增强来源，实现了+23.1%准确率百分点的提升。动画演示和补充材料可在https://lucazzola.github.io/publications/kinemic获取。