An approach to amputation, the process of introducing missing values to a complete dataset, is presented. It allows to construct missingness indicators in a flexible and principled way via copulas and Bernoulli margins and to incorporate dependence in missingness patterns. Besides more classical missingness models such as missing completely at random, missing at random, and missing not at random, the approach is able to model structured missingness such as block missingness and, via mixtures, monotone missingness, which are patterns of missing data frequently found in real-life datasets. Properties such as joint missingness probabilities or missingness correlation are derived mathematically. The approach is demonstrated with mathematical examples and empirical illustrations in terms of a well-known dataset.
翻译:本文提出了一种截断方法,即向完整数据集引入缺失值的过程。该方法通过联结函数与伯努利边缘分布,能够以灵活且具有理论依据的方式构建缺失指示变量,并整合缺失模式中的依赖性。除了更经典的完全随机缺失、随机缺失和非随机缺失模型外,该方法还能对结构化缺失进行建模,例如块状缺失以及通过混合模型实现的单调缺失——这些缺失模式在现实数据集中经常出现。文中通过数学推导得出了联合缺失概率和缺失相关性等性质。该方法通过数学示例和基于知名数据集的实证说明进行了演示。