Towards a Law of Iterated Expectations for Heuristic Estimators

Christiano et al. (2022) define a *heuristic estimator* to be a hypothetical algorithm that estimates the values of mathematical expressions from arguments. In brief, a heuristic estimator $\mathbb{G}$ takes as input a mathematical expression $Y$ and a formal "heuristic argument" $\pi$, and outputs an estimate $\mathbb{G}(Y \mid \pi)$ of $Y$. In this work, we argue for the informal principle that a heuristic estimator ought not to be able to predict its own errors, and we explore approaches to formalizing this principle. Most simply, the principle suggests that $\mathbb{G}(Y - \mathbb{G}(Y \mid \pi) \mid \pi)$ ought to equal zero for all $Y$ and $\pi$. We argue that an ideal heuristic estimator ought to satisfy two stronger properties in this vein, which we term *iterated estimation* (by analogy to the law of iterated expectations) and *error orthogonality*. Although iterated estimation and error orthogonality are intuitively appealing, it can be difficult to determine whether a given heuristic estimator satisfies the properties. As an alternative approach, we explore *accuracy*: a property that (roughly) states that $\mathbb{G}$ has zero average error over a distribution of mathematical expressions. However, in the context of two estimation problems, we demonstrate barriers to creating an accurate heuristic estimator. We finish by discussing challenges and potential paths forward for finding a heuristic estimator that accords with our intuitive understanding of how such an estimator ought to behave, as well as the potential applications of heuristic estimators to understanding the behavior of neural networks.

翻译：Christiano等人（2022）将*启发式估计器*定义为一种假想算法，该算法能够根据参数对数学表达式的值进行估计。简而言之，启发式估计器$\mathbb{G}$以数学表达式$Y$和形式化的"启发式论证"$\pi$作为输入，并输出对$Y$的估计值$\mathbb{G}(Y \mid \pi)$。本文主张一个非形式化原则：启发式估计器应当无法预测其自身的误差，并探讨了形式化该原则的多种路径。最简明的表述是，该原则意味着对于任意$Y$和$\pi$，应有$\mathbb{G}(Y - \mathbb{G}(Y \mid \pi) \mid \pi)$等于零。我们进一步论证，理想的启发式估计器应满足两个更强的性质：*迭代估计性*（类比于迭代期望定律）与*误差正交性*。尽管这两个性质具有直观吸引力，但判断特定启发式估计器是否满足这些性质往往存在困难。作为替代方案，我们探究了*精确性*：该性质（粗略地）要求$\mathbb{G}$在数学表达式分布上的平均误差为零。然而，通过两个估计问题的实例，我们揭示了构建精确启发式估计器所面临的理论障碍。最后，我们讨论了寻找符合直觉理解的启发式估计器所面临的挑战与潜在路径，并探讨了启发式估计器在理解神经网络行为方面的潜在应用价值。