This study examines the effect that different feature selection methods have on models created with XGBoost, a popular machine learning algorithm with superb regularization methods. It shows that three different ways for reducing the dimensionality of features produces no statistically significant change in the prediction accuracy of the model. This suggests that the traditional idea of removing the noisy training data to make sure models do not overfit may not apply to XGBoost. But it may still be viable in order to reduce computational complexity.
翻译:本研究探讨了不同特征选择方法对基于XGBoost(一种具有卓越正则化方法的流行机器学习算法)所构建模型的影响。研究表明,三种不同的特征降维方法并未对模型的预测精度产生统计学上的显著改变。这意味着,传统上通过去除噪声训练数据以确保模型不过拟合的思路可能不适用于XGBoost。但特征选择在降低计算复杂度方面可能仍然具有实用价值。