Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often 'unnatural', representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.
翻译:大多数现代强化学习算法沿着轨迹优化累积的单步成本。优化后的运动通常显得“不自然”,例如表现出突然加速的行为,既浪费能量又缺乏可预测性。在本研究中,我们提出了一种通过最小化Koopman谱成本来控制非线性系统的新范式:该成本基于受控动力学的Koopman算子。这催生了一类更广泛的动态行为,这些行为在稳定流形上演化,例如非线性振荡器、闭环系统和平滑运动。我们证明,某些无法通过累积成本实现的动力学特性在该范式中变得可行,从而将经典的特征结构配置与极点配置推广至非线性决策过程。此外,我们针对该问题提出了一种样本高效的在线学习算法,该算法在特定结构假设下具有次线性遗憾界。