In the real world, data is often noisy, affecting not only the quality of features but also the accuracy of labels. Current research on mitigating label errors stems primarily from advances in deep learning, and a gap exists in exploring interpretable models, particularly those rooted in decision trees. In this study, we investigate whether ideas from deep learning loss design can be applied to improve the robustness of decision trees. In particular, we show that loss correction and symmetric losses, both standard approaches, are not effective. We argue that other directions need to be explored to improve the robustness of decision trees to label noise.
翻译:在现实世界中,数据通常存在噪声,这不仅影响特征质量,也影响标签准确性。当前关于缓解标签错误的研究主要源于深度学习的进展,而在探索可解释模型——特别是基于决策树的模型——方面存在空白。本研究探讨了深度学习损失函数设计中的思想是否可用于提升决策树的鲁棒性。我们特别证明了损失校正与对称损失这两种标准方法均无效。我们认为需要探索其他方向以增强决策树对标签噪声的鲁棒性。