Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.
翻译:缺失数据是实际场景中的常见问题。为应对这一问题,研究者已开发出多种数据填补方法。然而,尽管训练数据中通常包含标签信息,常见的数据填补实践往往仅依赖输入特征而忽略标签。本文阐释了将标签堆叠至输入特征可显著提升输入数据填补效果。此外,我们提出一种分类策略:将待预测测试标签初始化为缺失值,并将其与输入特征堆叠后进行联合填补。该方法能同时完成标签与输入特征的填补,且无需预先填补即可处理存在标签缺失的训练数据,适用于连续型、离散型或混合型数据。实验结果表明该方法在准确性方面具有良好表现。