Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.
翻译:设计用于敏捷机器人操作的强化学习奖励函数仍然困难,且基于演示的方法通常需要参考运动,这在新型平台或极限特技任务中难以获取。我们提出LineRides——一种线条引导学习框架,使定制自行车机器人能够仅凭用户提供的空间引导线和稀疏关键朝向,无需演示或显式时序,自主习得多种可指挥的特技行为。LineRides通过跟踪裕度处理物理不可行的引导线,允许受控偏差;利用沿引导线行驶距离测量进度以消除时间模糊性;并基于位置和序列的关键朝向明确运动细节。我们在超机动车辆平台上评估LineRides,实验表明基于该方法训练的策略支持正常行驶与特技执行间的无缝切换,可响应指令实现五种特技:小跳、大跳、三点转向、后空翻和漂移转向。