This report presents our reinforcement learning-based approach for the swing-up and stabilisation tasks of the acrobot and pendubot, tailored specifcially to the updated guidelines of the 3rd AI Olympics at ICRA 2025. Building upon our previously developed Average-Reward Entropy Advantage Policy Optimization (AR-EAPO) algorithm, we refined our solution to effectively address the new competition scenarios and evaluation metrics. Extensive simulations validate that our controller robustly manages these revised tasks, demonstrating adaptability and effectiveness within the updated framework.
翻译:本报告介绍了我们基于强化学习的方法,专门针对ICRA 2025第三届人工智能奥林匹克竞赛更新指南中的Acrobot和Pendubot摆起与稳定任务。在我们先前开发的平均奖励熵优势策略优化(AR-EAPO)算法基础上,我们改进了解决方案,以有效应对新的竞赛场景和评估指标。大量仿真验证表明,我们的控制器能够稳健地处理这些修订后的任务,在更新后的框架内展现出适应性和有效性。