OffCon$^3$: What is state of the art anyway?

Two popular approaches to model-free continuous control tasks are SAC and TD3. At first glance these approaches seem rather different; SAC aims to solve the entropy-augmented MDP by minimising the KL-divergence between a stochastic proposal policy and a hypotheical energy-basd soft Q-function policy, whereas TD3 is derived from DPG, which uses a deterministic policy to perform policy gradient ascent along the value function. In reality, both approaches are remarkably similar, and belong to a family of approaches we call `Off-Policy Continuous Generalized Policy Iteration'. This illuminates their similar performance in most continuous control benchmarks, and indeed when hyperparameters are matched, their performance can be statistically indistinguishable. To further remove any difference due to implementation, we provide OffCon$^3$ (Off-Policy Continuous Control: Consolidated), a code base featuring state-of-the-art versions of both algorithms.

翻译：无模式连续控制任务的两个普遍做法是SAC和TD3。乍一看,这两种做法似乎相当不同;SAC的目的是通过将随机建议政策与虚伪的能源基软功能政策之间的KL差异最小化,解决微小放大的MDP,而TD3则来自DPG,DPG使用确定性政策来在价值函数的同时执行政策梯度。在现实中,这两种方法都非常相似,并属于我们称之为“非政策持续通用政策循环”的一套做法。这在最连续的控制基准中说明了它们的类似性能,事实上,当超参数相匹配时,其性能在统计上是无法区分的。为了进一步消除任何因执行而产生的差异,我们提供了Offcon$3美元(非政策持续控制:合并),这是一个以两种算法的状态为特征的代码基础。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

63+阅读 · 2020年2月17日