D-Optimality-Guided Reinforcement Learning for Efficient Open-Loop Calibration of a 3-DOF Ankle Rehabilitation Robot

Accurate alignment of multi-degree-of-freedom rehabilitation robots is essential for safe and effective patient training. This paper proposes a two-stage calibration framework for a self-designed three-degree-of-freedom (3-DOF) ankle rehabilitation robot. First, a Kronecker-product-based open-loop calibration method is developed to cast the input-output alignment into a linear parameter identification problem, which in turn defines the associated experimental design objective through the resulting information matrix. Building on this formulation, calibration posture selection is posed as a combinatorial design-of-experiments problem guided by a D-optimality criterion, i.e., selecting a small subset of postures that maximises the determinant of the information matrix. To enable practical selection under constraints, a Proximal Policy Optimization (PPO) agent is trained in simulation to choose 4 informative postures from a candidate set of 50. Across simulation and real-robot evaluations, the learned policy consistently yields substantially more informative posture combinations than random selection: the mean determinant of the information matrix achieved by PPO is reported to be more than two orders of magnitude higher with reduced variance. In addition, real-world results indicate that a parameter vector identified from only four D-optimality-guided postures provides stronger cross-episode prediction consistency than estimates obtained from a larger but unstructured set of 50 postures. The proposed framework therefore improves calibration efficiency while maintaining robust parameter estimation, offering practical guidance for high-precision alignment of multi-DOF rehabilitation robots.

翻译：多自由度康复机器人的精确对准对于安全有效的患者训练至关重要。本文针对自主设计的三自由度踝关节康复机器人，提出了一种两阶段标定框架。首先，开发了一种基于Kronecker积的开环标定方法，将输入-输出对准问题转化为线性参数辨识问题，进而通过所得信息矩阵定义了相关的实验设计目标。基于此公式，标定姿态选择被构建为一个由D最优性准则指导的组合实验设计问题，即选择一个小子集的姿态，以最大化信息矩阵的行列式。为了在约束条件下实现实际选择，在仿真中训练了一个近端策略优化智能体，使其从50个候选姿态中选择4个信息丰富的姿态。在仿真和真实机器人评估中，学习到的策略始终产生比随机选择信息量显著更高的姿态组合：PPO所实现的信息矩阵行列式均值据报告高出两个数量级以上，且方差降低。此外，真实世界结果表明，仅从四个D最优性准则指导的姿态中辨识出的参数向量，相比从更大但非结构化的50个姿态集中获得的估计值，具有更强的跨周期预测一致性。因此，所提出的框架在保持稳健参数估计的同时提高了标定效率，为多自由度康复机器人的高精度对准提供了实用指导。