Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.
翻译:在强化学习中,为敏捷机器人机动动作设计奖励函数仍然困难,而基于演示的方法通常需要参考运动,这些参考运动对于新型平台或极限特技而言难以获取。我们提出LineRides,一种线引导学习框架,使定制自行车机器人能够从用户提供的空间引导线和稀疏关键方向中获取多样化、可指令的特技行为,无需演示或显式时序。LineRides利用跟踪裕度处理物理上不可行的引导线,允许受控偏差;通过沿引导线行驶距离的进度测量解决时序模糊性;并通过基于位置和序列的关键方向消除运动细节歧义。我们在超机动车辆(UMV)上评估LineRides,结果显示,使用我们的方法训练的策略支持正常行驶与特技执行的无缝切换,可实现五种不同的指令特技:MiniHop、LargeHop、ThreePointTurn、Backflip和DriftTurn。