Recently, learning-based controllers have been shown to push mobile robotic systems to their limits and provide the robustness needed for many real-world applications. However, only classical optimization-based control frameworks offer the inherent flexibility to be dynamically adjusted during execution by, for example, setting target speeds or actuator limits. We present a framework to overcome this shortcoming of neural controllers by conditioning them on an auxiliary input. This advance is enabled by including a feature-wise linear modulation layer (FiLM). We use model-free reinforcement-learning to train quadrotor control policies for the task of navigating through a sequence of waypoints in minimum time. By conditioning the policy on the maximum available thrust or the viewing direction relative to the next waypoint, a user can regulate the aggressiveness of the quadrotor's flight during deployment. We demonstrate in simulation and in real-world experiments that a single control policy can achieve close to time-optimal flight performance across the entire performance envelope of the robot, reaching up to 60 km/h and 4.5g in acceleration. The ability to guide a learned controller during task execution has implications beyond agile quadrotor flight, as conditioning the control policy on human intent helps safely bringing learning based systems out of the well-defined laboratory environment into the wild.
翻译:近期研究表明,基于学习的控制器能够将移动机器人系统性能推向极限,并提供实际应用所需的鲁棒性。然而,仅有传统的基于优化的控制框架具备执行过程中动态调整的固有灵活性,例如通过设定目标速度或执行器限制。我们提出一种框架,通过将神经控制器与辅助输入条件化,从而克服这一缺陷。该进展得益于引入特征级线性调制层(FiLM)。我们采用无模型强化学习来训练四旋翼飞行器的控制策略,使其能以最短时间通过一系列航点。通过将策略与最大可用推力或相对于下一个航点的观测方向进行条件化,用户可在部署期间调节飞行器的激进程度。仿真与真实实验表明,单一控制策略能在机器人整个性能包络内实现接近时间最优的飞行性能,最高速度达60公里/小时,加速达4.5g。这种在任务执行中引导学习控制器的能力不仅限于敏捷四旋翼飞行,通过将控制策略与人类意图条件化,有助于将基于学习的系统安全地从明确的实验室环境引入实际野外场景。