We present an algorithm for efficient domain-adaptive policy learning via kernel representations. Learning domain-adaptive policies is challenging since it requires an environment representation that is both sufficiently expressive to model complex sim-to-real gaps during offline training, and computationally efficient enough to support rapid online adaptation during deployment. For instance, a quadrotor may encounter time-varying, non-stationary disturbances, such as sudden gusts of wind, payload shifts, or transitions between distinct flight regimes with and without ground effects. To address these challenges, we model unknown disturbances using a differentiable kernel approximation based on random Fourier features. During the offline training phase, we randomly sample kernel coefficients and bandwidth parameters to generate a rich diversity of disturbance profiles. We then optimize the control policy via differentiable simulation with analytical gradients, a process that takes only 50 seconds of training time on an RTX 4090 GPU. During hardware deployment, the policy adapts to non-stationary environments in real time by updating both the kernel coefficients and bandwidth through online least-squares estimation. We evaluate our method on quadrotor trajectory tracking tasks across high-fidelity numerical simulations and hardware experiments using Crazyflie, subjected to various disturbances, including complex aerodynamic effects, wind, ground effects, and payload fluctuations.
翻译:我们提出了一种基于核表示的高效领域自适应策略学习算法。学习领域自适应策略具有挑战性,因为它需要一种环境表示,该表示既要具备足够的表达能力以建模离线训练期间复杂的仿真到现实差距,又要具备足够的计算效率以支持部署期间的快速在线自适应。例如,四旋翼可能会遇到时变的非平稳扰动,如突如其来的阵风、载荷变化,或是在有无地面效应的不同飞行模式间的转换。为应对这些挑战,我们利用基于随机傅里叶特征的可微核近似来建模未知扰动。在离线训练阶段,我们随机采样核系数和带宽参数,以生成丰富多样的扰动场景。随后,我们通过具有解析梯度的可微仿真来优化控制策略,该过程在RTX 4090 GPU上仅需50秒训练时间。在硬件部署阶段,策略通过在线最小二乘估计同时更新核系数和带宽,实时适应非平稳环境。我们基于Crazyflie无人机在包含复杂空气动力学效应、风扰、地面效应和载荷波动等各类扰动的高保真数值仿真与硬件实验中,对所提方法进行了四旋翼轨迹跟踪任务的评估。