Deep learning has been successfully applied to human activity recognition. However, training deep neural networks requires explicitly labeled data which is difficult to acquire. In this paper, we present a model with multiple siamese networks that are trained by using only the information about the similarity between pairs of data samples without knowing the explicit labels. The trained model maps the activity data samples into fixed size representation vectors such that the distance between the vectors in the representation space approximates the similarity of the data samples in the input space. Thus, the trained model can work as a metric for a wide range of different clustering algorithms. The training process minimizes a similarity loss function that forces the distance metric to be small for pairs of samples from the same kind of activity, and large for pairs of samples from different kinds of activities. We evaluate the model on three datasets to verify its effectiveness in segmentation and recognition of continuous human activity sequences.
翻译:深度学习已成功应用于人体活动识别。然而,训练深度神经网络需要显式标注的数据,这在实际中难以获取。本文提出了一种基于多个孪生网络的模型,该模型仅利用数据样本对之间的相似性信息进行训练,无需知道样本的显式标签。训练后的模型将活动数据样本映射为固定大小的表示向量,使得表示空间中向量间的距离近似于输入空间中数据样本的相似性。因此,该训练模型可作为多种不同聚类算法的度量标准。训练过程通过最小化相似性损失函数,迫使同一活动类型的样本对之间的距离度量较小,而不同活动类型的样本对之间的距离度量较大。我们在三个数据集上评估了该模型在连续人体活动序列的分割与识别中的有效性。