Tracking of dynamic people in cluttered and crowded human-centered environments is a challenging robotics problem due to the presence of intraclass variations including occlusions, pose deformations, and lighting variations. This paper introduces a novel deep learning architecture, using conditional latent diffusion models, the Latent Diffusion Track (LDTrack), for tracking multiple dynamic people under intraclass variations. By uniquely utilizing conditional latent diffusion models to capture temporal person embeddings, our architecture can adapt to appearance changes of people over time. We incorporated a latent feature encoder network which enables the diffusion process to operate within a high-dimensional latent space to allow for the extraction and spatial-temporal refinement of such rich features as person appearance, motion, location, identity, and contextual information. Extensive experiments demonstrate the effectiveness of LDTrack over other state-of-the-art tracking methods in cluttered and crowded human-centered environments under intraclass variations. Namely, the results show our method outperforms existing deep learning robotic people tracking methods in both tracking accuracy and tracking precision with statistical significance.
翻译:在杂乱且拥挤的人为中心环境中跟踪动态人群是一个具有挑战性的机器人学问题,其难点在于存在类内差异(包括遮挡、姿态变形及光照变化)。本文提出了一种新颖的深度学习架构——基于条件潜在扩散模型的潜在扩散跟踪器(LDTrack),用于在类内差异条件下跟踪多个动态人物。通过独特地利用条件潜在扩散模型捕获时域人物嵌入,该架构能够适应人物随时间发生的外观变化。我们集成了一种潜在特征编码网络,使扩散过程能在高维潜在空间中运行,从而提取并时空精炼人物外观、运动、位置、身份及上下文信息等丰富特征。大量实验表明,在杂乱拥挤的人为中心环境中存在类内差异时,LDTrack比其他先进跟踪方法更具有效性。具体而言,结果显示本方法在跟踪精度和跟踪准确率上均显著优于现有深度学习机器人人群跟踪方法。