SkeFi: Cross-Modal Knowledge Transfer for Wireless Skeleton-Based Action Recognition

Skeleton-based action recognition leverages human pose keypoints to categorize human actions, which shows superior generalization and interoperability compared to regular end-to-end action recognition. Existing solutions use RGB cameras to annotate skeletal keypoints, but their performance declines in dark environments and raises privacy concerns, limiting their use in smart homes and hospitals. This paper explores non-invasive wireless sensors, i.e., LiDAR and mmWave, to mitigate these challenges as a feasible alternative. Two problems are addressed: (1) insufficient data on wireless sensor modality to train an accurate skeleton estimation model, and (2) skeletal keypoints derived from wireless sensors are noisier than RGB, causing great difficulties for subsequent action recognition models. Our work, SkeFi, overcomes these gaps through a novel cross-modal knowledge transfer method acquired from the data-rich RGB modality. We propose the enhanced Temporal Correlation Adaptive Graph Convolution (TC-AGC) with frame interactive enhancement to overcome the noise from missing or inconsecutive frames. Additionally, our research underscores the effectiveness of enhancing multiscale temporal modeling through dual temporal convolution. By integrating TC-AGC with temporal modeling for cross-modal transfer, our framework can extract accurate poses and actions from noisy wireless sensors. Experiments demonstrate that SkeFi realizes state-of-the-art performances on mmWave and LiDAR. The code is available at https://github.com/Huang0035/Skefi.

翻译：基于骨架的动作识别利用人体姿态关键点对人类动作进行分类，相较于常规端到端动作识别展现出更优的泛化能力与互操作性。现有方案通常依赖RGB相机标注骨骼关键点，但其在暗光环境下性能显著下降，且存在隐私隐患，限制了在智能家居及医院等场景的应用。本文探索采用非侵入式无线传感器（即LiDAR与毫米波雷达）作为可行替代方案以应对这些挑战。研究重点解决两个核心问题：（1）无线传感器模态数据不足导致难以训练精确的骨架估计模型；（2）无线传感器提取的骨骼关键点噪声较RGB模态更为显著，为后续动作识别模型带来巨大困难。本文提出的SkeFi框架通过一种从数据丰富的RGB模态中获取知识的创新跨模态迁移方法，成功克服了上述缺陷。我们提出增强型时序关联自适应图卷积（TC-AGC）并结合帧交互增强机制，以应对因帧缺失或不连续产生的噪声干扰。此外，研究通过双时序卷积增强多尺度时序建模的有效性。通过将TC-AGC与时序建模相结合实现跨模态迁移，本框架能够从含噪无线传感器数据中提取精确的姿态与动作信息。实验表明，SkeFi在毫米波雷达与LiDAR数据集上均实现了最先进的性能。代码已开源：https://github.com/Huang0035/Skefi。