This work presents a comprehensive analysis to regularize the Soft Actor-Critic (SAC) algorithm with automatic temperature adjustment. The the policy evaluation, the policy improvement and the temperature adjustment are reformulated, addressing certain modification and enhancing the clarity of the original theory in a more explicit manner.
翻译:本文对带自动温度调节的Soft Actor-Critic(SAC)算法进行了系统的正则化分析。通过重新表述策略评估、策略改进和温度调节过程,本文针对原始理论进行了特定修正,并以更为明确的方式增强了其理论清晰度。