Optimal decision-making for trajectory tracking in partially observable, stochastic environments where the number of active localization updates -- the process by which the agent obtains its true state information from the sensors -- are limited, presents a significant challenge. Traditional methods often struggle to balance resource conservation, accurate state estimation and precise tracking, resulting in suboptimal performance. This problem is particularly pronounced in environments with large action spaces, where the need for frequent, accurate state data is paramount, yet the capacity for active localization updates is restricted by external limitations. This paper introduces ComTraQ-MPC, a novel framework that combines Deep Q-Networks (DQN) and Model Predictive Control (MPC) to optimize trajectory tracking with constrained active localization updates. The meta-trained DQN ensures adaptive active localization scheduling, while the MPC leverages available state information to improve tracking. The central contribution of this work is their reciprocal interaction: DQN's update decisions inform MPC's control strategy, and MPC's outcomes refine DQN's learning, creating a cohesive, adaptive system. Empirical evaluations in simulated and real-world settings demonstrate that ComTraQ-MPC significantly enhances operational efficiency and accuracy, providing a generalizable and approximately optimal solution for trajectory tracking in complex partially observable environments.
翻译:在部分可观测的随机环境中进行轨迹跟踪的最优决策面临重大挑战,其中主动定位更新(即智能体从传感器获取其真实状态信息的过程)的次数受到限制。传统方法往往难以在资源节约、准确状态估计与精确跟踪之间取得平衡,导致性能欠佳。该问题在动作空间较大的环境中尤为突出,这些环境对频繁、准确的状态数据需求极高,但主动定位更新的能力却受到外部限制的约束。本文提出了ComTraQ-MPC,一种结合深度Q网络(DQN)与模型预测控制(MPC)的新型框架,用于在主动定位更新受限条件下优化轨迹跟踪。元训练的DQN确保自适应的主动定位调度,而MPC则利用可用的状态信息来改进跟踪。本工作的核心贡献在于二者的交互作用:DQN的更新决策指导MPC的控制策略,而MPC的结果又优化DQN的学习,从而形成一个协调、自适应的系统。在仿真与真实环境中的实证评估表明,ComTraQ-MPC显著提升了运行效率与准确性,为复杂部分可观测环境中的轨迹跟踪提供了一个可泛化且近似最优的解决方案。