Data missingness is a practical challenge of sustained interest to the scientific community. In this paper, we present Shades-of-Null, an evaluation suite for responsible missing value imputation. Our work is novel in two ways (i) we model realistic and socially-salient missingness scenarios that go beyond Rubin's classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) settings, to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test) (ii) we evaluate imputers holistically, based on imputation quality and imputation fairness, as well as on the predictive performance, fairness and stability of the models that are trained and tested on the data post-imputation. We use Shades-of-Null to conduct a large-scale empirical study involving 29,736 experimental pipelines, and find that while there is no single best-performing imputation approach for all missingness types, interesting trade-offs arise between predictive performance, fairness and stability, based on the combination of missingness scenario, imputer choice, and the architecture of the predictive model. We make Shades-of-Null publicly available, to enable researchers to rigorously evaluate missing value imputation methods on a wide range of metrics in plausible and socially meaningful scenarios.
翻译:数据缺失性是科学界持续关注的实际挑战。本文提出Shades-of-Null——一个用于负责任缺失值插补的评估套件。本研究的创新点在于:(一)我们建模了超越Rubin经典框架(完全随机缺失、随机缺失、非随机缺失)的现实且具有社会显著性的缺失场景,包括多重机制缺失(当数据中同时存在不同缺失模式时)和缺失机制偏移(当训练与测试阶段的缺失机制发生变化时);(二)我们采用整体性评估方法,不仅考量插补质量与插补公平性,还评估基于插补后数据训练和测试的预测模型在性能、公平性和稳定性方面的表现。通过Shades-of-Null套件,我们开展了包含29,736个实验管道的大规模实证研究,发现虽然不存在适用于所有缺失类型的单一最优插补方法,但基于缺失场景、插补器选择与预测模型架构的组合,在预测性能、公平性和稳定性之间会出现值得关注的权衡关系。我们公开提供Shades-of-Null套件,使研究者能够在合理且具有社会意义的场景中,基于多维度指标对缺失值插补方法进行严格评估。