Recent advances in large language models (LLMs) have provided new opportunities for decision-making, particularly in the task of automated feature selection. In this paper, we first comprehensively evaluate LLM-based feature selection methods, covering the state-of-the-art DeepSeek-R1, GPT-o3-mini, and GPT-4.5. Then, we propose a novel hybrid strategy called LLM4FS that integrates LLMs with traditional data-driven methods. Specifically, input data samples into LLMs, and directly call traditional data-driven techniques such as random forest and forward sequential selection. Notably, our analysis reveals that the hybrid strategy leverages the contextual understanding of LLMs and the high statistical reliability of traditional data-driven methods to achieve excellent feature selection performance, even surpassing LLMs and traditional data-driven methods. Finally, we point out the limitations of its application in decision-making.
翻译:近年来,大型语言模型(LLMs)的进展为决策任务,特别是自动化特征选择任务,提供了新的机遇。本文首先全面评估了基于LLM的特征选择方法,涵盖了当前最先进的DeepSeek-R1、GPT-o3-mini和GPT-4.5模型。随后,我们提出了一种名为LLM4FS的新型混合策略,该策略将LLMs与传统数据驱动方法相结合。具体而言,我们将数据样本输入LLMs,并直接调用随机森林和前向顺序选择等传统数据驱动技术。值得注意的是,我们的分析表明,该混合策略通过结合LLMs的上下文理解能力和传统数据驱动方法的高统计可靠性,实现了优异的特征选择性能,甚至超越了单独的LLMs和传统数据驱动方法。最后,我们指出了该方法在决策应用中的局限性。