Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.
翻译:缺失数据在实际场景中是一个常见问题。目前已发展出多种插补方法用于处理缺失数据。然而,尽管训练数据中通常包含标签信息,常见插补实践往往仅依赖输入数据而忽略标签。在本研究中,我们展示了将标签叠加到输入中可以显著改善输入的插补效果。此外,我们提出了一种分类策略,该策略用缺失值初始化预测的测试标签,并将标签与输入叠加后进行插补。这使得标签与输入能够同时被插补。同时,该技术能够在无需预先插补的情况下处理带有缺失标签的训练数据,并适用于连续型、分类型或混合型数据。实验结果显示该方法在准确性方面具有良好表现。