Through our respiratory system, many viruses and diseases frequently spread and pass from one person to another. Covid-19 served as an example of how crucial it is to track down and cut back on contacts to stop its spread. There is a clear gap in finding automatic methods that can detect hand-to-face contact in complex urban scenes or indoors. In this paper, we introduce a computer vision framework, called FaceTouch, based on deep learning. It comprises deep sub-models to detect humans and analyse their actions. FaceTouch seeks to detect hand-to-face touches in the wild, such as through video chats, bus footage, or CCTV feeds. Despite partial occlusion of faces, the introduced system learns to detect face touches from the RGB representation of a given scene by utilising the representation of the body gestures such as arm movement. This has been demonstrated to be useful in complex urban scenarios beyond simply identifying hand movement and its closeness to faces. Relying on Supervised Contrastive Learning, the introduced model is trained on our collected dataset, given the absence of other benchmark datasets. The framework shows a strong validation in unseen datasets which opens the door for potential deployment.
翻译:摘要:许多病毒和疾病常通过呼吸系统在人与人之间传播。新冠疫情凸显了追踪并减少接触以遏制病毒传播的关键性。当前在复杂城市场景或室内环境中,缺乏能自动检测手-脸接触的方法。本文提出一种基于深度学习的计算机视觉框架FaceTouch。该框架包含用于检测人体并分析其动作的深度子模型,旨在通过视频通话、公交车监控或闭路电视等场景实时检测手-脸接触行为。尽管面部可能存在部分遮挡,该系统通过利用手臂运动等身体姿态表征,从给定场景的RGB图像中学习检测手-脸接触。实验表明,在复杂城市场景中,该方法超越了单纯识别手部运动及面部距离的局限。鉴于缺乏其他基准数据集,我们基于监督对比学习训练所提模型的自建数据集。该框架在未见过的数据集上展现出强大的验证效果,为实际部署奠定基础。