Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.
翻译:缺失数据是实际场景中的常见问题。目前已发展出多种插补方法来处理缺失数据。然而,尽管训练数据中通常包含标签信息,但常规插补实践通常仅依赖输入特征而忽略标签。本研究阐明,将标签叠加至输入特征中可显著改善对输入特征的插补效果。此外,我们提出一种分类策略:将待预测测试标签初始化为缺失值,并将其与输入特征叠加后进行插补,从而同步实现对标签与输入特征的填补。该技术还能在不进行预插补的情况下处理带有缺失标签的训练数据,适用于连续型、类别型或混合型数据。实验结果表明该方法在准确性方面具有良好表现。