In modern experimental science there is a commonly encountered problem of estimating the coefficients of a linear regression in the context where the variables of interest can never be observed simultaneously. Assuming that the global experiment can be decomposed into sub-experiments with distinct first moments, we propose two estimators of the linear regression that take this additional information into account. We consider an estimator based on moments, and an estimator based on optimal transport theory. These estimators are proven to be consistent as well as asymptotically Gaussian under weak hypotheses. The asymptotic variance has no explicit expression, except in some particular cases, for which reason a stratified bootstrap approach is developed to build confidence intervals for the estimated parameters, whose consistency is also shown. A simulation study, assessing and comparing the finite sample performances of these estimators, demonstrated the advantages of the bootstrap approach in multiple realistic scenarios. An application to in vivo experiments, conducted in the context of studying radio-induced adverse effects on mice, revealed important relationships between the biomarkers of interest that could not be identified with the considered naive approach.
翻译:在现代实验科学中,常遇到这样一个问题:在感兴趣的变量永远无法被同时观测的背景下,估计线性回归的系数。假设全局实验可分解为具有不同一阶矩的子实验,我们提出了两种考虑此附加信息的线性回归估计量。我们研究了一种基于矩的估计量,以及一种基于最优传输理论的估计量。这些估计量被证明在弱假设下具有一致性且渐近服从高斯分布。除某些特殊情况外,其渐近方差无显式表达式,为此我们开发了一种分层自助法来构建估计参数的置信区间,并证明了该方法的相合性。一项评估和比较这些估计量有限样本性能的模拟研究,证明了自助法在多种现实场景中的优势。在一项针对小鼠辐射诱导不良反应研究的体内实验应用中,该方法揭示了用所考虑的朴素方法无法识别的、感兴趣生物标志物之间的重要关系。