Musculoskeletal diseases and cognitive impairments in patients lead to difficulties in movement as well as negative effects on their psychological health. Clinical gait analysis, a vital tool for early diagnosis and treatment, traditionally relies on expensive optical motion capture systems. Recent advances in computer vision and deep learning have opened the door to more accessible and cost-effective alternatives. This paper introduces a novel spatio-temporal Transformer network to estimate critical gait parameters from RGB videos captured by a single-view camera. Empirical evaluations on a public dataset of cerebral palsy patients indicate that the proposed framework surpasses current state-of-the-art approaches and show significant improvements in predicting general gait parameters (including Walking Speed, Gait Deviation Index - GDI, and Knee Flexion Angle at Maximum Extension), while utilizing fewer parameters and alleviating the need for manual feature extraction.
翻译:肌肉骨骼疾病和认知障碍患者不仅存在运动困难,还面临心理健康方面的负面影响。临床步态分析作为早期诊断和治疗的关键工具,传统上依赖昂贵的光学动作捕捉系统。近年来计算机视觉与深度学习的进展为更易获取、更具成本效益的替代方案打开了大门。本文提出了一种新颖的时空Transformer网络,用于从单视角相机拍摄的RGB视频中估计关键步态参数。在脑瘫患者公开数据集上的实证评估表明,所提出的框架超越了当前最先进的方法,在预测通用步态参数(包括步行速度、步态偏差指数GDI和最大伸展时的膝关节屈曲角度)方面展现出显著改进,同时使用更少的参数并免除了手动特征提取的需求。