We first define appropriate state representation and action space, and then design an adjustment mechanism based on the actions selected by the intelligent agent. The adjustment mechanism outputs the next state and reward value of the agent. Additionally, the adjustment mechanism calculates the error between the adjusted state and the unadjusted state. Furthermore, the intelligent agent stores the acquired experience samples containing states and reward values in a buffer and replays the experiences during each iteration to learn the dynamic characteristics of the environment. We name the improved algorithm as the DQM algorithm. Experimental results demonstrate that the intelligent agent using our proposed algorithm effectively reduces the accumulated errors of inertial navigation in dynamic environments. Although our research provides a basis for achieving autonomous navigation of unmanned aerial vehicles, there is still room for significant optimization. Further research can include testing unmanned aerial vehicles in simulated environments, testing unmanned aerial vehicles in real-world environments, optimizing the design of reward functions, improving the algorithm workflow to enhance convergence speed and performance, and enhancing the algorithm's generalization ability.
翻译:我们首先定义了合适的状态表示和动作空间,然后基于智能体选择的动作设计了调整机制。调整机制输出智能体的下一状态和奖励值,并计算调整后状态与未调整状态之间的误差。此外,智能体将包含状态和奖励值的经验样本存储于缓冲区中,在每次迭代中回放这些经验以学习环境的动态特性。我们将改进后的算法命名为DQM算法。实验结果表明,采用我们提出的算法的智能体能够有效减少动态环境中惯性导航的累积误差。尽管我们的研究为实现无人机自主导航提供了基础,但仍存在显著的优化空间。未来的研究可以包括在模拟环境中测试无人机、在真实环境中测试无人机、优化奖励函数设计、改进算法流程以提升收敛速度与性能、以及增强算法的泛化能力。