Previous work has separately addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization, speeding up learning and providing robust solutions at unprecedented levels. However, solutions of those problems are hectic, ranging from convex and non-convex optimization, and unconstrained optimization to constrained optimization. Here we provide a general dual function formalism that transforms the constrained optimization problem into an unconstrained convex one for any mixture of action and state entropies. The cases with pure action entropy and pure state entropy are understood as limits of the mixture.
翻译:先前工作分别研究了不同形式的动作熵、状态熵及动作-状态熵正则化、纯探索与空间占用问题。这些课题在正则化、泛化、加速学习及提供前所未有的鲁棒解等方面具有极其重要的价值。然而,这些问题的求解方法繁杂多样,涉及凸优化、非凸优化、无约束优化及约束优化等多种范式。本文提出了一种通用的对偶函数形式化框架,可将任意混合动作熵与状态熵的约束优化问题转化为无约束凸优化问题。其中纯动作熵与纯状态熵的情形可视为该混合框架的极限特例。