The social robot navigation is an open and challenging problem. In existing work, separate modules are used to capture spatial and temporal features, respectively. However, such methods lead to extra difficulties in improving the utilization of spatio-temporal features and reducing the conservative nature of navigation policy. In light of this, we present a spatio-temporal transformer-based policy optimization algorithm to enhance the utilization of spatio-temporal features, thereby facilitating the capture of human-robot interactions. Specifically, this paper introduces a gated embedding mechanism that effectively aligns the spatial and temporal representations by integrating both modalities at the feature level. Then Transformer is leveraged to encode the spatio-temporal semantic information, with hope of finding the optimal navigation policy. Finally, a combination of spatio-temporal Transformer and self-adjusting policy entropy significantly reduces the conservatism of navigation policies. Experimental results demonstrate the effectiveness of the proposed framework, where our method shows superior performance.
翻译:社交机器人导航是一个开放且具有挑战性的问题。现有方法分别采用独立模块来捕捉空间和时间特征。然而,此类方法在提升时空特征利用率以及降低导航策略保守性方面带来了额外困难。鉴于此,我们提出一种基于时空Transformer的策略优化算法,以增强时空特征的利用率,从而促进对人机交互的捕捉。具体而言,本文引入一种门控嵌入机制,通过在特征层面整合空间与时间两种模态,有效对齐二者的表征。随后,利用Transformer对时空语义信息进行编码,以期找到最优导航策略。最后,时空Transformer与自调节策略熵的结合显著降低了导航策略的保守性。实验结果表明,所提框架具有有效性,且我们的方法展现出优越的性能。