Deep Reinforcement Learning (DRL) has shown promise for social navigation, yet its real-world deployment remains hindered by a persistent sim-to-real gap arising from simplified first-order dynamics and context-specific human state estimation pipelines. This work presents a unified framework that addresses these limitations to produce dynamically feasible navigation policies suitable for real-world deployment. First, theoretical analysis reveals that tracking error between simulated and actual robot position decays exponentially with increased control order, motivating the use of higher-order control inputs as DRL action space. A second-order control formulation tailored to differential drive robots is developed, complemented by a stochastic iterative Linear Quadratic Regulator (iLQR) that pretrains the policy via a divergence minimization objective. Second, to avoid the added system complexity of camera-LiDAR fusion, a cluster-based human tracking pipeline using only 2D LiDAR is introduced. Human detections are associated according to both spatial proximity and velocity similarity, enabling reliable differentiation of nearby pedestrians and yielding stable velocity estimates through temporal aggregation. Third, we introduce an unbiased residual gating block to balance reaction- and memory-based behaviors while handling time-varying crowd sizes, both critical for social navigation. The resulting policy, KinematicRL, consistently improves kinematic performance and adapts to varying number of detected humans. Experiments in real-world environments demonstrate that, when combined with the proposed tracking pipeline, KinematicRL can be deployed on a real differential drive robot with minimal modifications.
翻译:深度强化学习(DRL)在社会导航领域展现出潜力,但其真实世界部署仍受限于因简化的一阶动力学和情境特定的人体状态估计流程所导致的持续性虚实迁移差距。本文提出一个统一框架来解决这些局限,以生成适用于真实世界部署的动态可行导航策略。首先,理论分析表明,模拟与真实机器人位置之间的跟踪误差会随控制阶数的增加而指数级衰减,这为使用高阶控制输入作为DRL动作空间提供了理论依据。针对差速驱动机器人,本文开发了一种二阶控制公式,并辅以随机迭代线性二次型调节器(iLQR),通过散度最小化目标对策略进行预训练。其次,为避免相机-激光雷达融合带来的额外系统复杂度,本文提出了一种仅使用二维激光雷达的基于聚类的人体跟踪流程。人体检测结果根据空间邻近性和速度相似性进行关联,从而能够可靠地区分附近行人,并通过时间聚合获得稳定的速度估计。第三,我们引入了一种无偏残差门控模块,以在处理时变人群规模的同时平衡基于反应和基于记忆的行为,这两者对社会导航都至关重要。由此产生的策略KinematicRL持续提升了运动学性能,并能适应不同数量的检测到人类。在真实环境中的实验表明,当与所提出的跟踪流程相结合时,KinematicRL可以在最少修改的情况下部署到真实的差速驱动机器人上。