Understanding how an exposure transmits its effect through high-dimensional intermediaries is a central problem in observational research. We study the problem of finding a composite mediator that maximises the indirect effect of an exposure on an outcome in a linear structural equation model. Although the objective is non-convex in the weight vector, a geometric argument yields a closed-form global solution: the optimal weight bisects the angle between two computable path vectors in a weighted inner product space, recovered via two linear solves. The resulting algorithm, MaxIE, runs at the same cost as ordinary least squares -- orders of magnitude lower than numerical optimisation -- with a dual formulation for settings where mediators outnumber observations. The same path vectors yield a test for the global null that no composite mediator exists, with t(p-1) in the classical and t(n-2) in the dual regime. Power is characterised analytically as a function of the population path angle; simulations confirm size control and the power characterisation. Applied to a UK Biobank proteomics dataset (n=38,383, p=2,916), the method rejects the global null (p-value = 6.4e-9) and identifies the optimal proteomic composite mediating age's effect on dementia.
翻译:理解暴露如何通过高维中介变量传递其效应是观察性研究的核心问题。我们研究在线性结构方程模型中寻找能够最大化暴露对结果间接效应的复合中介变量的问题。尽管目标函数在权重向量上是非凸的,但通过几何论证可得到闭式全局解:最优权重在加权内积空间中二等分两个可计算路径向量之间的夹角,该解可通过两次线性求解获得。由此产生的算法MaxIE的计算成本与普通最小二乘法相同——比数值优化低数个数量级——并针对中介变量数量超过观测样本量的场景提供了对偶形式。相同的路径向量可生成对“不存在任何复合中介变量”这一全局零假设的检验,其检验统计量在经典情形下服从t(p-1)分布,在对偶情形下服从t(n-2)分布。我们通过总体路径夹角的函数解析刻画了检验功效;模拟实验证实了尺寸控制的有效性和功效刻画。将该方法应用于英国生物银行蛋白质组学数据集(n=38,383, p=2,916),结果拒绝了全局零假设(p值 = 6.4e-9)并识别出介导年龄对痴呆症影响的最优蛋白质复合中介变量。