When primary objectives are insensitive or delayed, experimenters may instead focus on proxy metrics derived from secondary outcomes. For example, technology companies often infer the long-term impacts of product interventions from their effects on short-term user engagement signals. We consider the meta-analysis of many historical experiments to learn the covariance of treatment effects on these outcomes, which can support the construction of such proxies. Even when experiments are plentiful, if treatment effects are weak, the covariance of estimated treatment effects across experiments can be highly biased. We overcome this with techniques inspired by weak instrumental variable analysis. We show that Limited Information Maximum Likelihood (LIML) learns a parameter equivalent to fitting total least squares to a transformation of the scatterplot of treatment effects, and that Jackknife Instrumental Variables Estimation (JIVE) learns another parameter computable from the average of Jackknifed covariance matrices across experiments. We also present a total covariance estimator for the latter estimand under homoskedasticity, which is equivalent to a $k$-class estimator. We show how these parameters can be used to construct unbiased proxy metrics under various structural models. Lastly, we discuss the real-world application of our methods at Netflix.
翻译:当主要目标不敏感或延迟时,实验者可能转而关注从次要结果中推导出的代理指标。例如,科技公司通常从产品干预对短期用户参与信号的影响来推断其长期影响。我们考虑对大量历史实验进行元分析,以学习这些结果上处理效应的协方差,这有助于构建此类代理指标。即使实验数量充足,若处理效应较弱,跨实验估计处理效应的协方差可能存在严重偏差。我们借鉴弱工具变量分析的技术克服了这一问题。研究表明,有限信息最大似然法(LIML)学习的参数等价于对处理效应散点图的变换进行总体最小二乘拟合,而刀切法工具变量估计(JIVE)学习的另一参数可通过实验间刀切协方差矩阵的平均值计算得出。我们还针对同方差性条件下后一个估计量提出了总体协方差估计量,该估计量等价于$k$类估计量。我们展示了这些参数如何在不同结构模型下用于构建无偏代理指标。最后,我们讨论了这些方法在Netflix的实际应用。