Dimension reduction techniques have long been an important topic in statistics, and active subspaces (AS) have received much attention this past decade in the computer experiments literature. The most common approach towards estimating the AS is to use Monte Carlo with numerical gradient evaluation. While sensible in some settings, this approach has obvious drawbacks. Recent research has demonstrated that active subspace calculations can be obtained in closed form, conditional on a Gaussian process (GP) surrogate, which can be limiting in high-dimensional settings for computational reasons. In this paper, we produce the relevant calculations for a more general case when the model of interest is a linear combination of tensor products. These general equations can be applied to the GP, recovering previous results as a special case, or applied to the models constructed by other regression techniques including multivariate adaptive regression splines (MARS). Using a MARS surrogate has many advantages including improved scaling, better estimation of active subspaces in high dimensions and the ability to handle a large number of prior distributions in closed form. In one real-world example, we obtain the active subspace of a radiation-transport code with 240 inputs and 9,372 model runs in under half an hour.
翻译:降维技术长期以来是统计学中的重要课题,而活性子空间(AS)在过去十年间的计算机实验文献中备受关注。估算AS最常用的方法是基于数值梯度评估的蒙特卡洛方法,虽然此方法在某些场景下具有合理性,但其明显缺陷不容忽视。最新研究表明,在高斯过程代理模型条件下可解析计算活性子空间,但受限于高维场景的计算效率。本文针对张量积线性组合的通用模型给出了相关计算,这些通用方程既可应用于高斯过程(将前期成果作为特例复原),也可应用于其他回归技术构建的模型(包括多元自适应回归样条)。基于多元自适应回归样条代理模型具有多重优势:拓展性更优、高维活性子空间估算更准、能以封闭形式处理大规模先验分布。在某实际案例中,我们仅用半小时即完成了包含240个输入变量和9372次模型运行的辐射传输代码活性子空间计算。