In this work, we propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data. The proposed framework comprises two stages and each stage includes signal-level image generation (SIG), cross-attention (CsA), dynamic time warping (DTW) modules and the information fusion between the proposed privacy-preserved position and orientation features. The proposed SIG method aims to transform the raw skeleton data into privacy-preserved features for training. The CsA module is developed to guide the network in reducing medical action recognition bias and more focusing on important human body parts for each specific action, aimed at addressing similar medical action related issues. Moreover, the DTW module is employed to minimize temporal mismatching between instances and further improve model performance. Furthermore, the proposed privacy-preserved orientation-level features are utilized to assist the position-level features in both of the two stages for enhancing medical action recognition performance. Extensive experimental results on the widely-used and well-known NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets all demonstrate the effectiveness of the proposed method, which outperforms the other state-of-the-art methods with general dataset partitioning by 2.7%, 6.2% and 4.1%, respectively.
翻译:本文提出一种面向位置与朝向感知的医疗动作识别单样本学习框架,用于从信号数据中识别医疗动作。该框架包含两个阶段,每个阶段均集成信号级图像生成(SIG)、交叉注意力(CsA)、动态时间规整(DTW)模块,以及本文提出的隐私保护位置与朝向特征的融合机制。SIG方法旨在将原始骨骼数据转化为隐私保护特征用于训练;CsA模块引导网络聚焦于特定动作的关键人体部位,以减少医疗动作识别偏差并解决相似动作混淆问题;DTW模块则用于消除实例间的时序错位,进一步提升模型性能。此外,本文提出的隐私保护朝向特征在两个阶段中均辅助位置特征,增强医疗动作识别效果。在广泛使用的NTU RGB+D 60、NTU RGB+D 120及PKU-MMD数据集上的大量实验证明,本方法在常规数据集划分条件下分别以2.7%、6.2%和4.1%的性能提升优于现有最优方法。