Musculoskeletal diseases and cognitive impairments in patients lead to difficulties in movement as well as negative effects on their psychological health. Clinical gait analysis, a vital tool for early diagnosis and treatment, traditionally relies on expensive optical motion capture systems. Recent advances in computer vision and deep learning have opened the door to more accessible and cost-effective alternatives. This paper introduces a novel spatio-temporal Transformer network to estimate critical gait parameters from RGB videos captured by a single-view camera. Empirical evaluations on a public dataset of cerebral palsy patients indicate that the proposed framework surpasses current state-of-the-art approaches and show significant improvements in predicting general gait parameters (including Walking Speed, Gait Deviation Index - GDI, and Knee Flexion Angle at Maximum Extension), while utilizing fewer parameters and alleviating the need for manual feature extraction.
翻译:肌肉骨骼疾病和认知障碍患者会出现运动困难,并对其心理健康产生负面影响。临床步态分析作为早期诊断和治疗的重要工具,传统上依赖昂贵的光学运动捕捉系统。近年来计算机视觉与深度学习的进步为更易获取、成本更低的替代方案打开了大门。本文提出一种新颖的时空Transformer网络,用于从单视角摄像头拍摄的RGB视频中估计关键步态参数。在脑瘫患者公开数据集上的实证评估表明,所提框架超越了当前最先进方法,在预测通用步态参数(包括步行速度、步态偏差指数-GDI和最大伸展时的膝关节屈曲角度)方面取得显著提升,同时使用更少参数并免除了手动特征提取的需求。