Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions, rotation matrices, and Lie algebra coordinates introduce distinct constraints and failure modes. While these trade-offs are well studied for supervised learning, their implications for actions in reinforcement learning remain unclear. We systematically evaluate SO(3) action representations across three standard continuous control algorithms, PPO, SAC, and TD3, under dense and sparse rewards. We compare how representations shape exploration, interact with entropy regularization, and affect training stability through empirical studies and analyze the implications of different projections for obtaining valid rotations from Euclidean network outputs. Across a suite of robotics benchmarks, we quantify the practical impact of these choices and distill simple, implementation-ready guidelines for selecting and using rotation actions. Our results highlight that representation-induced geometry strongly influences exploration and optimization and show that representing actions as tangent vectors in the local frame yields the most reliable results across algorithms. The project webpage and code are available at amacati.github.io/so3 primer.
翻译:许多机器人控制任务要求策略能够对姿态进行操作,然而 SO(3) 的几何特性使得这一点并非易事。由于 SO(3) 不存在全局、光滑且最小化的参数化方法,常见的表示方式,如欧拉角、四元数、旋转矩阵和李代数坐标,各自引入了不同的约束和失效模式。尽管这些权衡在监督学习中已得到充分研究,但它们对强化学习中动作的影响仍不明确。我们在密集奖励和稀疏奖励设置下,系统性地评估了 SO(3) 动作表示在三种标准连续控制算法(PPO、SAC 和 TD3)中的表现。我们通过实证研究,比较了不同表示如何塑造探索过程、与熵正则化相互作用以及影响训练稳定性,并分析了从欧氏网络输出中获取有效旋转的不同投影方法所蕴含的意义。在一系列机器人基准测试中,我们量化了这些选择带来的实际影响,并提炼出简单、可直接用于实践的指导原则,用于选择和使用旋转动作。我们的结果表明,表示所引入的几何特性强烈影响探索和优化过程,并且表明将动作表示为局部坐标系中的切向量能在不同算法中获得最可靠的结果。项目网页和代码可在 amacati.github.io/so3 primer 获取。