LLM-Driven Reasoning for Constraint-Aware Feature Selection in Industrial Systems

Yuhang Zhou,Zhuokai Zhao,Ke Li,Spilios Evmorfos,Gökalp Demirci,Mingyi Wang,Qiao Liu,Qifei Wang,Serena Li,Weiwei Li,Tingting Wang,Mingze Gao,Gedi Zhou,Abhishek Kumar,Xiangjun Fan,Lizhu Zhang,Jiayi Liu

from arxiv, 11 pages, 2 tables

Feature selection is a crucial step in large-scale industrial machine learning systems, directly affecting model accuracy, efficiency, and maintainability. Traditional feature selection methods rely on labeled data and statistical heuristics, making them difficult to apply in production environments where labeled data are limited and multiple operational constraints must be satisfied. To address this, we propose Model Feature Agent (MoFA), a model-driven framework that performs sequential, reasoning-based feature selection using both semantic and quantitative feature information. MoFA incorporates feature definitions, importance scores, correlations, and metadata (e.g., feature groups or types) into structured prompts and selects features through interpretable, constraint-aware reasoning. We evaluate MoFA in three real-world industrial applications: (1) True Interest and Time-Worthiness Prediction, where it improves accuracy while reducing feature group complexity, (2) Value Model Enhancement, where it discovers high-order interaction terms that yield substantial engagement gains in online experiments, and (3) Notification Behavior Prediction, where it selects compact, high-value feature subsets that improve both model accuracy and inference efficiency. Together, these results demonstrate the practicality and effectiveness of LLM-based reasoning for feature selection in real production systems.

翻译：特征选择是大型工业机器学习系统中的关键环节，直接影响模型精度、效率与可维护性。传统特征选择方法依赖标注数据和统计启发式策略，在标注数据有限且需满足多重运行约束的生产环境中难以应用。为解决该问题，我们提出模型特征代理框架，这是一种利用语义与量化特征信息，通过有序推理进行特征选择的模型驱动框架。该框架将特征定义、重要性评分、相关性及元数据（如特征组或类型）融入结构化提示词，通过可解释的约束感知推理选择特征。我们在三个真实工业应用场景中评估该框架：（1）真实兴趣与时效性预测任务中，它在提升精度的同时降低了特征组复杂度；（2）价值模型增强任务中，它发现的高阶交互项在在线实验中带来显著参与度提升；（3）通知行为预测任务中，它筛选出精简的高价值特征子集，同时提升模型精度与推理效率。这些结果共同证明了基于大模型推理在真实生产系统中进行特征选择的实用性与有效性。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日

【伯克利博士论文】从推理服务到模型训练：面向大规模 LLM 智能体的高效系统构建

专知会员服务

19+阅读 · 1月2日

《面向智能制造的工业大模型标准化研究报告》

专知会员服务

30+阅读 · 2025年5月10日

118页纽约大学《深度学习模型训练优化方法综述：收敛性与泛化性的理论视角》

专知会员服务

29+阅读 · 2025年1月27日