Joint Models for Handling Non-Ignorable Missing Data using Bayesian Additive Regression Trees: Application to Leaf Photosynthetic Traits Data

Dealing with missing data poses significant challenges in predictive analysis, often leading to biased conclusions when oversimplified assumptions about the missing data process are made. In cases where the data are missing not at random (MNAR), jointly modeling the data and missing data indicators is essential. Motivated by a real data application with partially missing multivariate outcomes related to leaf photosynthetic traits and several environmental covariates, we propose two methods under a selection model framework for handling data with missingness in the response variables suitable for recovering various missingness mechanisms. Both approaches use a multivariate extension of Bayesian additive regression trees (BART) to flexibly model the outcomes. The first approach simultaneously uses a probit regression model to jointly model the missingness. In scenarios where the relationship between the missingness and the data is more complex or non-linear, we propose a second approach using a probit BART model to characterize the missing data process, thereby employing two BART models simultaneously. Both models also effectively handle ignorable covariate missingness. The efficacy of both models compared to existing missing data approaches is demonstrated through extensive simulations, in both univariate and multivariate settings, and through the aforementioned application to the leaf photosynthetic trait data.

翻译：在预测分析中，处理缺失数据带来了重大挑战，当对缺失数据过程做出过度简化的假设时，往往会导致有偏的结论。在数据非随机缺失（MNAR）的情况下，对数据和缺失数据指示变量进行联合建模至关重要。受一个真实数据应用的启发——该应用涉及与叶片光合性状相关的部分缺失多元结果变量及若干环境协变量——我们在选择模型框架下提出了两种方法，用于处理响应变量存在缺失的数据，这些方法适用于恢复各种缺失机制。两种方法均使用贝叶斯加性回归树（BART）的多元扩展来灵活地对结果变量进行建模。第一种方法同时使用概率回归模型来联合建模缺失机制。在缺失机制与数据之间的关系更为复杂或非线性的场景下，我们提出了第二种方法，使用概率BART模型来刻画缺失数据过程，从而同时使用两个BART模型。两种模型也能有效处理可忽略的协变量缺失。通过广泛的模拟研究（包括单变量和多元设置）以及前述对叶片光合性状数据的应用，证明了这两种模型相较于现有缺失数据处理方法的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/