Falls are a major cause of injuries and deaths among older adults worldwide. Accurate fall detection can help reduce potential injuries and additional health complications. Different types of video modalities can be used in a home setting to detect falls, including RGB, Infrared, and Thermal cameras. Anomaly detection frameworks using autoencoders and their variants can be used for fall detection due to the data imbalance that arises from the rarity and diversity of falls. However, the use of reconstruction error in autoencoders can limit the application of networks' structures that propagate information. In this paper, we propose a new multi-objective loss function called Temporal Shift, which aims to predict both future and reconstructed frames within a window of sequential frames. The proposed loss function is evaluated on a semi-naturalistic fall detection dataset containing multiple camera modalities. The autoencoders were trained on normal activities of daily living (ADL) performed by older adults and tested on ADLs and falls performed by young adults. Temporal shift shows significant improvement to a baseline 3D Convolutional autoencoder, an attention U-Net CAE, and a multi-modal neural network. The greatest improvement was observed in an attention U-Net model improving by 0.20 AUC ROC for a single camera when compared to reconstruction alone. With significant improvement across different models, this approach has the potential to be widely adopted and improve anomaly detection capabilities in other settings besides fall detection.
翻译:跌倒是全球老年人受伤和死亡的主要原因之一。准确的跌倒检测有助于减少潜在伤害及额外的健康并发症。家庭环境中可使用不同类型的视频模态(包括RGB、红外和热成像摄像头)来检测跌倒。由于跌倒事件的罕见性和多样性导致数据不平衡,使用自编码器及其变体的异常检测框架可用于跌倒检测。然而,自编码器中重构误差的使用会限制传播信息的网络结构的应用。本文提出一种名为时间平移(Temporal Shift)的新型多目标损失函数,旨在预测连续帧窗口内的未来帧和重构帧。该损失函数在包含多种摄像头模态的半自然跌倒检测数据集上进行了评估。自编码器使用老年人的正常日常生活活动(ADL)进行训练,并在年轻人的ADL和跌倒活动上进行测试。时间平移在基线3D卷积自编码器、注意力U-Net CAE及多模态神经网络上均显示出显著改进。其中,注意力U-Net模型的提升最为显著,单摄像头环境下相较仅使用重构的模型,AUC ROC提高了0.20。鉴于该方法在不同模型上的显著改进,该技术具有广泛应用的潜力,并可推广至跌倒检测之外的其他场景中增强异常检测能力。