Utilizing covariate information has been a powerful approach to improve the efficiency and accuracy for causal inference, which support massive amount of randomized experiments run on data-driven enterprises. However, state-of-art approaches can become practically unreliable when the dimension of covariate increases to just 50, whereas experiments on large platforms can observe even higher dimension of covariate. We propose a machine-learning-assisted covariate representation approach that can effectively make use of historical experiment or observational data that are run on the same platform to understand which lower dimensions can effectively represent the higher-dimensional covariate. We then propose design and estimation methods with the covariate representation. We prove statistically reliability and performance guarantees for the proposed methods. The empirical performance is demonstrated using numerical experiments.
翻译:利用协变量信息是提高因果推断效率和精度的有力方法,它支撑着数据驱动型企业运行的大量随机实验。然而,当协变量的维度仅增加到50时,现有方法在实践中的可靠性会显著下降,而大型平台上的实验可能观察到更高维度的协变量。我们提出了一种基于机器学习的协变量表示方法,该方法能够有效利用同一平台上运行的历史实验或观测数据,从而理解哪些低维空间可以有效表示高维协变量。随后,我们提出了基于协变量表示的设计与估计方法。我们证明了所提方法在统计上的可靠性及性能保证。通过数值实验验证了其经验性能。