Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a convolutional stacked autoencoder to match image crops with a reference crop containing the target feature, learning deep feature encodings specific to the object category in an unsupervised manner, thus reducing data requirements. To overcome edge effects making the performance dependent on crop size, we introduced a Gaussian weight on the residual errors of the pixels when calculating the loss function. Training the autoencoder on facial images and validating its performance on manually labeled face and hand videos, our Deep Feature Encodings (DFE) method demonstrated superior tracking accuracy with a mean error ranging from 0.6 to 3.3 pixels, outperforming traditional methods like SIFT, SURF, Lucas Kanade, and the latest transformers like PIPs++ and CoTracker. Overall, our unsupervised learning approach excels in tracking various skin features under significant motion conditions, providing superior feature descriptors for tracking, matching, and image registration compared to both traditional and state-of-the-art supervised learning methods.
翻译:面部特征跟踪在成像式弹道心动图中对于精确心率估计至关重要,并通过皮肤特征跟踪实现帕金森病的运动退化量化。虽然深度卷积神经网络在跟踪任务中展现出卓越的准确性,但它们通常需要大量标注数据进行监督训练。我们提出的流程采用卷积堆叠自编码器,将图像裁剪块与包含目标特征的参考裁剪块进行匹配,以无监督方式学习针对对象类别的深度特征编码,从而降低数据需求。为了克服边界效应导致的性能依赖于裁剪块尺寸的问题,我们引入高斯权重对计算损失函数时的像素残差进行加权。在面部图像上训练自编码器,并在手动标注的面部和手部视频上验证其性能,我们的深度特征编码(DFE)方法展现出卓越的跟踪准确性,平均误差范围在0.6至3.3像素之间,优于SIFT、SURF、Lucas Kanade等传统方法,以及PIPs++和CoTracker等最新Transformer模型。总体而言,我们的无监督学习方法在显著运动条件下能够出色地跟踪各种皮肤特征,与传统和最先进的监督学习方法相比,为跟踪、匹配和图像配准提供了更优异的特征描述符。