The paper introduces an interactive machine learning mechanism to process the measurements of an uncertain, nonlinear dynamic process and hence advise an actuation strategy in real-time. For concept demonstration, a trajectory-following optimization problem of a Kinova robotic arm is solved using an integral reinforcement learning approach with guaranteed stability for slowly varying dynamics. The solution is implemented using a model-free value iteration process to solve the integral temporal difference equations of the problem. The performance of the proposed technique is benchmarked against that of another model-free high-order approach and is validated for dynamic payload and disturbances. Unlike its benchmark, the proposed adaptive strategy is capable of handling extreme process variations. This is experimentally demonstrated by introducing static and time-varying payloads close to the rated maximum payload capacity of the manipulator arm. The comparison algorithm exhibited up to a seven-fold percent overshoot compared to the proposed integral reinforcement learning solution. The robustness of the algorithm is further validated by disturbing the real-time adapted strategy gains with a white noise of a standard deviation as high as 5%.
翻译:本文提出了一种交互式机器学习机制,用于处理不确定非线性动态过程的测量数据,从而实时提供驱动策略。为进行概念验证,基于积分强化学习方法解决了Kinova机械臂的轨迹跟踪优化问题,该方法对慢速变化动态具有稳定性保证。通过无模型值迭代过程求解问题的积分时间差分方程来实现该方案。将所提技术的性能与另一无模型高阶方法进行对比,并在动态负载和扰动条件下进行验证。与对比方法不同,所提出的自适应策略能够处理极端过程变化。通过向机械臂施加接近额定最大负载能力的静态和时变负载进行了实验验证。与所提出的积分强化学习解决方案相比,对比算法表现出高达七倍的百分比超调量。通过向实时自适应策略增益注入标准差高达5%的白噪声,进一步验证了该算法的鲁棒性。