Optimal decision-making for trajectory tracking in partially observable, stochastic environments where the number of active localization updates -- the process by which the agent obtains its true state information from the sensors -- are limited, presents a significant challenge. Traditional methods often struggle to balance resource conservation, accurate state estimation and precise tracking, resulting in suboptimal performance. This problem is particularly pronounced in environments with large action spaces, where the need for frequent, accurate state data is paramount, yet the capacity for active localization updates is restricted by external limitations. This paper introduces ComTraQ-MPC, a novel framework that combines Deep Q-Networks (DQN) and Model Predictive Control (MPC) to optimize trajectory tracking with constrained active localization updates. The meta-trained DQN ensures adaptive active localization scheduling, while the MPC leverages available state information to improve tracking. The central contribution of this work is their reciprocal interaction: DQN's update decisions inform MPC's control strategy, and MPC's outcomes refine DQN's learning, creating a cohesive, adaptive system. Empirical evaluations in simulated and real-world settings demonstrate that ComTraQ-MPC significantly enhances operational efficiency and accuracy, providing a generalizable and approximately optimal solution for trajectory tracking in complex partially observable environments.
翻译:在部分可观测、随机环境中,当主动定位更新次数(即智能体通过传感器获取真实状态信息的过程)受限时,轨迹跟踪的最优决策制定面临重大挑战。传统方法往往难以平衡资源节约、准确状态估计与精确跟踪,导致次优性能。这一问题在大动作空间环境中尤为突出——尽管频繁获取精确状态数据至关重要,但外部限制却约束了主动定位更新的能力。本文提出ComTraQ-MPC框架,创新性地融合深度Q网络(DQN)与模型预测控制(MPC),以优化受限主动定位更新下的轨迹跟踪。元训练后的DQN确保自适应主动定位调度,而MPC则利用可用状态信息提升跟踪性能。核心贡献在于两者的交互机制:DQN的更新决策引导MPC控制策略,MPC的跟踪结果反哺DQN学习过程,形成自适应协同系统。在仿真与真实环境中的实验评估表明,ComTraQ-MPC显著提升运行效率与跟踪精度,为复杂部分可观测环境中的轨迹跟踪提供了一种可泛化、近似最优的解决方案。