The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds.
翻译:奖励假设认为,“我们所说的目标和目的,都可以很好地理解为最大化所接收的标量信号(奖励)累积和的期望值。”我们旨在完全解决这一假设。这并不会以简单的肯定或否定告终,而是全面规定该假设成立时目标和目的需满足的隐含条件。