Video-based ambient monitoring of gait for older adults with dementia has the potential to detect negative changes in health and allow clinicians and caregivers to intervene early to prevent falls or hospitalizations. Computer vision-based pose tracking models can process video data automatically and extract joint locations; however, publicly available models are not optimized for gait analysis on older adults or clinical populations. In this work we train a deep neural network to map from a two dimensional pose sequence, extracted from a video of an individual walking down a hallway toward a wall-mounted camera, to a set of three-dimensional spatiotemporal gait features averaged over the walking sequence. The data of individuals with dementia used in this work was captured at two sites using a wall-mounted system to collect the video and depth information used to train and evaluate our model. Our Pose2Gait model is able to extract velocity and step length values from the video that are correlated with the features from the depth camera, with Spearman's correlation coefficients of .83 and .60 respectively, showing that three dimensional spatiotemporal features can be predicted from monocular video. Future work remains to improve the accuracy of other features, such as step time and step width, and test the utility of the predicted values for detecting meaningful changes in gait during longitudinal ambient monitoring.
翻译:基于视频的老年人痴呆症患者步态环境监测具有检测健康状况负面变化的潜力,使临床医生和护理人员能够及早干预,预防跌倒或住院。基于计算机视觉的姿态追踪模型可自动处理视频数据并提取关节点位置,但现有公开模型并未针对老年人或临床人群的步态分析进行优化。本研究训练了一个深度神经网络,将从患者沿走廊走向壁挂式摄像头的单目视频中提取的二维姿态序列,映射为基于该行走序列平均的三维时空步态特征。本研究使用的痴呆症患者数据来自两个采集点,通过壁挂式系统收集视频和深度信息,用于模型训练与评估。我们提出的Pose2Gait模型能够从视频中提取与深度相机特征相关的速度和步长值,其斯皮尔曼相关系数分别为0.83和0.60,表明三维时空特征可从单目视频中预测。未来工作需提升其他特征(如步时和步宽)的预测精度,并检验预测值在纵向环境监测中检测步态有意义变化的实用性。