超越分布：连续强化学习的几何动作控制 (Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning)

Gaussian policies have dominated continuous control in deep reinforcement learning (RL), yet they suffer from a fundamental mismatch: their unbounded support requires ad-hoc squashing functions that distort the geometry of bounded action spaces. While von Mises-Fisher (vMF) distributions offer a theoretically grounded alternative on the sphere, their reliance on Bessel functions and rejection sampling hinders practical adoption. We propose \textbf{Geometric Action Control (GAC)}, a novel action generation paradigm that preserves the geometric benefits of spherical distributions while \textit{simplifying computation}. GAC decomposes action generation into a direction vector and a learnable concentration parameter, enabling efficient interpolation between deterministic actions and uniform spherical noise. This design reduces parameter count from \(2d\) to \(d+1\), and avoids the \(O(dk)\) complexity of vMF rejection sampling, achieving simple \(O(d)\) operations. Empirically, GAC consistently matches or exceeds state-of-the-art methods across six MuJoCo benchmarks, achieving 37.6\% improvement over SAC on Ant-v4 and up to 112\% on complex DMControl tasks, demonstrating strong performance across diverse benchmarks. Our ablation studies reveal that both \textbf{spherical normalization} and \textbf{adaptive concentration control} are essential to GAC's success. These findings suggest that robust and efficient continuous control does not require complex distributions, but a principled respect for the geometry of action spaces.

翻译：高斯策略在深度强化学习的连续控制领域占据主导地位，但其存在一个根本性的不匹配问题：其无界支撑集需要使用临时性的压缩函数，这会扭曲有界动作空间的几何结构。虽然冯·米塞斯-费舍尔分布在球面上提供了理论上有依据的替代方案，但其对贝塞尔函数和拒绝采样的依赖阻碍了实际应用。我们提出**几何动作控制**，这是一种新颖的动作生成范式，它保留了球形分布的几何优势，同时**简化了计算**。GAC将动作生成分解为一个方向向量和一个可学习的集中度参数，从而能够在确定性动作和均匀球形噪声之间进行高效插值。这种设计将参数量从 \(2d\) 减少到 \(d+1\)，并避免了vMF拒绝采样 \(O(dk)\) 的复杂度，实现了简单的 \(O(d)\) 操作。实证结果表明，GAC在六个MuJoCo基准测试中持续匹配或超越了最先进的方法，在Ant-v4上相比SAC实现了37.6%的性能提升，在复杂的DMControl任务上提升高达112%，展现了其在多样化基准测试中的强大性能。我们的消融研究表明，**球形归一化**和**自适应集中度控制**对于GAC的成功都至关重要。这些发现表明，稳健且高效的连续控制并不需要复杂的分布，而是需要对动作空间的几何结构进行原则性的尊重。