Questions of `how best to acquire data' are essential to modeling and prediction in the natural and social sciences, engineering applications, and beyond. Optimal experimental design (OED) formalizes these questions and creates computational methods to answer them. This article presents a systematic survey of modern OED, from its foundations in classical design theory to current research involving OED for complex models. We begin by reviewing criteria used to formulate an OED problem and thus to encode the goal of performing an experiment. We emphasize the flexibility of the Bayesian and decision-theoretic approach, which encompasses information-based criteria that are well-suited to nonlinear and non-Gaussian statistical models. We then discuss methods for estimating or bounding the values of these design criteria; this endeavor can be quite challenging due to strong nonlinearities, high parameter dimension, large per-sample costs, or settings where the model is implicit. A complementary set of computational issues involves optimization methods used to find a design; we discuss such methods in the discrete (combinatorial) setting of observation selection and in settings where an exact design can be continuously parameterized. Finally we present emerging methods for sequential OED that build non-myopic design policies, rather than explicit designs; these methods naturally adapt to the outcomes of past experiments in proposing new experiments, while seeking coordination among all experiments to be performed. Throughout, we highlight important open questions and challenges.
翻译:“如何最佳地获取数据”的问题对于自然科学、社会科学、工程应用及其他领域的建模与预测至关重要。最优实验设计(OED)将这些问题进行形式化,并创建计算方法来解答它们。本文系统性地综述了现代OED,从其经典设计理论基础到当前涉及复杂模型OED的研究。我们首先回顾了用于构建OED问题并由此编码实验目标的准则。我们强调贝叶斯与决策理论方法的灵活性,它包含了适用于非线性与非高斯统计模型的基于信息的准则。接着,我们讨论了估计或界定这些设计准则值的方法;由于强烈的非线性、高参数维度、高单位样本成本或模型为隐式等情形,这一任务可能极具挑战性。一组互补的计算问题涉及用于寻找设计的优化方法;我们在观测选择的离散(组合)设置以及精确设计可连续参数化的设置中讨论了这些方法。最后,我们介绍了新兴的序贯OED方法,这些方法构建非近视的设计策略而非显式设计;这些方法在提出新实验时自然地适应过去实验的结果,同时寻求所有待执行实验之间的协调。全文我们重点强调了重要的开放性问题与挑战。