Social group detection is a crucial aspect of various robotic applications, including robot navigation and human-robot interactions. To date, a range of model-based techniques have been employed to address this challenge, such as the F-formation and trajectory similarity frameworks. However, these approaches often fail to provide reliable results in crowded and dynamic scenarios. Recent advancements in this area have mainly focused on learning-based methods, such as deep neural networks that use visual content or human pose. Although visual content-based methods have demonstrated promising performance on large-scale datasets, their computational complexity poses a significant barrier to their practical use in real-time applications. To address these issues, we propose a simple and efficient framework for social group detection. Our approach explores the impact of motion trajectory on social grouping and utilizes a novel, reliable, and fast data-driven method. We formulate the individuals in a scene as a graph, where the nodes are represented by LSTM-encoded trajectories and the edges are defined by the distances between each pair of tracks. Our framework employs a modified graph transformer module and graph clustering losses to detect social groups. Our experiments on the popular JRDBAct dataset reveal noticeable improvements in performance, with relative improvements ranging from 2% to 11%. Furthermore, our framework is significantly faster, with up to 12x faster inference times compared to state-of-the-art methods under the same computation resources. These results demonstrate that our proposed method is suitable for real-time robotic applications.
翻译:社交群体检测是机器人导航与人机交互等机器人应用中的关键环节。迄今已有多种基于模型的技术被用于解决该挑战,例如F-formation和轨迹相似性框架。然而,这些方法在拥挤动态场景中往往无法提供可靠结果。该领域最新进展主要聚焦于基于学习的方法,例如利用视觉内容或人体姿态的深度神经网络。尽管基于视觉内容的方法在大规模数据集上展现出良好性能,但其计算复杂度严重阻碍了在实时场景中的实际应用。为解决这些问题,我们提出一种简洁高效的社交群体检测框架。该方法探索运动轨迹对社交分组的影响,并采用新颖、可靠且快速的数据驱动方案。我们将场景中的个体建模为图结构,其中节点由LSTM编码轨迹表示,边由每对轨迹间的距离定义。该框架采用改进的图变换器模块和图聚类损失函数来检测社交群体。在主流JRDBAct数据集上的实验表明,我们的方法在性能上实现了2%至11%的相对提升。此外,在相同计算资源下,本框架的推理速度较现有最优方法最高提升12倍。这些结果表明,所提方法适用于实时机器人应用场景。