This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition KL weights coincide, yielding a linear Bellman operator, path-integral solution, and compositionality -- extending these computationally favourable properties to a broad class of control problems.
翻译:本文通过库尔贝克-莱布勒(KL)散度正则化的视角,提出了一个统一的最优控制理论框架。我们构建了一个核心问题,该问题将策略与状态转移的KL惩罚项以独立权重进行分离,从而推广了概率最优控制中标准轨迹级KL正则化方法。这一统一形式涵盖多种控制问题:经典随机最优控制(SOC)、风险敏感随机最优控制(RSOC),及其基于策略的KL正则化变体(称为软策略SOC与RSOC),后者提供了可计算的替代目标。这些软策略形式不仅是正则化变体,更是原始SOC与RSOC的优化上界,因此通过迭代求解这些软策略目标即可恢复原始问题。我们进一步发现软策略RSOC中存在一种同步情形,此时策略与转移的KL权重相等,可导出线性贝尔曼算子、路径积分解与组合性——将这些计算优势扩展至更广泛的控制问题类别。