LLM4FS：利用大型语言模型进行特征选择及其改进方法 (LLM4FS: Leveraging Large Language Models for Feature Selection and How to Improve It)

Recent advances in large language models (LLMs) have provided new opportunities for decision-making, particularly in the task of automated feature selection. In this paper, we first comprehensively evaluate LLM-based feature selection methods, covering the state-of-the-art DeepSeek-R1, GPT-o3-mini, and GPT-4.5. Then, we propose a novel hybrid strategy called LLM4FS that integrates LLMs with traditional data-driven methods. Specifically, input data samples into LLMs, and directly call traditional data-driven techniques such as random forest and forward sequential selection. Notably, our analysis reveals that the hybrid strategy leverages the contextual understanding of LLMs and the high statistical reliability of traditional data-driven methods to achieve excellent feature selection performance, even surpassing LLMs and traditional data-driven methods. Finally, we point out the limitations of its application in decision-making.

翻译：近年来，大型语言模型（LLMs）的进展为决策任务，特别是自动化特征选择任务，提供了新的机遇。本文首先全面评估了基于LLM的特征选择方法，涵盖了当前最先进的DeepSeek-R1、GPT-o3-mini和GPT-4.5模型。随后，我们提出了一种名为LLM4FS的新型混合策略，该策略将LLMs与传统数据驱动方法相结合。具体而言，我们将数据样本输入LLMs，并直接调用随机森林和前向顺序选择等传统数据驱动技术。值得注意的是，我们的分析表明，该混合策略通过结合LLMs的上下文理解能力和传统数据驱动方法的高统计可靠性，实现了优异的特征选择性能，甚至超越了单独的LLMs和传统数据驱动方法。最后，我们指出了该方法在决策应用中的局限性。

相关内容

特征选择

关注 5939

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日