In 1967, Frederick Lord posed a conundrum that has confused scientists for over half a century. Subsequently named Lord's 'paradox', the puzzle centres on the observation that two different approaches to estimating the effect of an exposure on the 'change' in an outcome can produce radically different results. Approach 1 involves comparing the mean 'change score' between exposure groups and Approach 2 involves comparing the follow-up outcome between exposure groups conditional on the baseline outcome. Resolving this puzzle starts with recognising the three reasons that a variable may change value: (A) 'endogenous change', which represents autocorrelation from baseline, (B) 'random change', which represents change from transient random processes, and (C) 'exogenous change', which represents all non-endogenous, non-random change and contains all change that is potentially modifiable by other baseline variables. In observational data, neither Approach 1 nor Approach 2 can reliably estimate the causal effect of an exposure on 'exogenous change' in an outcome. Approach 1 is susceptible to diluted or opposite-sign estimates whenever the exposure causes, or is caused by, the baseline outcome. Approach 2 is susceptible to inflated estimates due to measurement error in the baseline outcome and time-varying confounding bias when the baseline outcome is a mediator. The measurement error can be reduced with multiple measures of the baseline outcome, and the time-varying confounding can be reduced using g- methods. Lord's 'paradox' offers several enduring lessons for observational data science including the importance of a well-defined research question and the problems with analysing change scores in observational data.
翻译:1967年,弗雷德里克·洛德提出了一个困扰科学家半个多世纪的难题。这个后来被称为洛德“悖论”的谜题,核心在于观察到两种估计暴露对结局“变化”影响的方法可能产生截然不同的结果:方法1涉及比较暴露组间的平均“变化分数”,方法2涉及在基线结局条件下比较暴露组间的随访结局。解决这一难题首先需要认识到变量可能发生数值变化的三个原因:(A)“内源性变化”,代表来自基线的自相关;(B)“随机变化”,代表瞬态随机过程引起的变化;(C)“外源性变化”,代表所有非内源性、非随机变化,包含所有可能被其他基线变量改变的变化。在观察性数据中,方法1和方法2都无法可靠估计暴露对结局“外源性变化”的因果效应。当暴露导致基线结局或被基线结局导致时,方法1容易产生稀释效应或符号相反的估计值。方法2则因基线结局的测量误差以及基线结局作为中介变量时产生的时变混杂偏倚,容易导致估计值膨胀。通过多次测量基线结局可减少测量误差,而使用g-方法可降低时变混杂偏倚。洛德“悖论”为观察性数据科学提供了若干持久启示,包括明确定义研究问题的重要性,以及在观察性数据中分析变化分数所存在的问题。