Derived variables are variables that are constructed from one or more source variables through established mathematical operations or algorithms. For example, body mass index (BMI) is a derived variable constructed from two source variables: weight and height. When using a derived variable as the outcome in a statistical model, complications arise when some of the source variables have missing values. In this paper, we propose how one can define a single fully Bayesian model to simultaneously impute missing values and sample from the posterior. We compare our proposed method with alternative approaches that rely on multiple imputation with examples including an analysis to estimate the risk of microcephaly (a derived variable based on sex, gestational age and head circumference at birth) in newborns exposed to the ZIKA virus.
翻译:衍生变量是指通过既定数学运算或算法从一个或多个源变量构建的变量。例如,身体质量指数(BMI)是由两个源变量(体重和身高)构建的衍生变量。当在统计模型中使用衍生变量作为结果变量时,若某些源变量存在缺失值,则会产生复杂问题。本文提出了一种定义单一完全贝叶斯模型的方法,该模型能够同时插补缺失值并从后验分布中采样。我们通过多个实例将所提方法与依赖多重插补的替代方法进行比较,其中包括一项针对暴露于寨卡病毒的新生儿小头畸形风险(基于性别、孕周和出生头围的衍生变量)的估计分析。