Random Forests are powerful ensemble learning algorithms widely used in various machine learning tasks. However, they have a tendency to overfit noisy or irrelevant features, which can result in decreased generalization performance. Post-hoc regularization techniques aim to mitigate this issue by modifying the structure of the learned ensemble after its training. Here, we propose Bayesian post-hoc regularization to leverage the reliable patterns captured by leaf nodes closer to the root, while potentially reducing the impact of more specific and potentially noisy leaf nodes deeper in the tree. This approach allows for a form of pruning that does not alter the general structure of the trees but rather adjusts the influence of leaf nodes based on their proximity to the root node. We have evaluated the performance of our method on various machine learning data sets. Our approach demonstrates competitive performance with the state-of-the-art methods and, in certain cases, surpasses them in terms of predictive accuracy and generalization.
翻译:随机森林是一种强大的集成学习算法,广泛应用于各类机器学习任务。然而,它们容易过拟合噪声或无关特征,这可能导致泛化性能下降。事后正则化技术旨在通过修改训练后集成模型的结构来缓解这一问题。本文提出了一种贝叶斯事后正则化方法,以利用靠近根节点的叶节点所捕获的可靠模式,同时可能降低树中更深层、更具体且可能包含噪声的叶节点的影响。该方法实现了一种不会改变树整体结构,而是根据叶节点与根节点的距离调整其影响力的剪枝形式。我们在多个机器学习数据集上评估了所提方法的性能。与现有最先进方法相比,我们的方法展现出具有竞争力的表现,并且在某些情况下,在预测准确性和泛化能力上超越了它们。