Derived variables are variables that are constructed from one or more source variables through established mathematical operations or algorithms. For example, body mass index (BMI) is a derived variable constructed from two source variables: weight and height. When using a derived variable as the outcome in a statistical model, complications arise when some of the source variables have missing values. In this paper, we propose how one can define a single fully Bayesian model to simultaneously impute missing values and sample from the posterior. We compare our proposed method with alternative approaches that rely on multiple imputation, and, with a simulated dataset, consider how best to estimate the risk of microcephaly in newborns exposed to the ZIKA virus.
翻译:派生变量是通过对单个或多个源变量执行既定数学运算或算法而构建的变量。例如,体重指数(BMI)是由体重和身高这两个源变量构建的派生变量。当在统计模型中将派生变量作为结果变量使用时,若部分源变量存在缺失值,则会引发复杂问题。本文提出如何定义一个完全贝叶斯模型,以同时实现缺失值插补和后验分布采样。我们将所提方法与依赖多重插补的替代方案进行对比,并通过模拟数据集探讨如何最佳地估计暴露于寨卡病毒的新生儿小头症风险。