Feature selection is a crucial step in large-scale industrial machine learning systems, directly affecting model accuracy, efficiency, and maintainability. Traditional feature selection methods rely on labeled data and statistical heuristics, making them difficult to apply in production environments where labeled data are limited and multiple operational constraints must be satisfied. To address this, we propose Model Feature Agent (MoFA), a model-driven framework that performs sequential, reasoning-based feature selection using both semantic and quantitative feature information. MoFA incorporates feature definitions, importance scores, correlations, and metadata (e.g., feature groups or types) into structured prompts and selects features through interpretable, constraint-aware reasoning. We evaluate MoFA in three real-world industrial applications: (1) True Interest and Time-Worthiness Prediction, where it improves accuracy while reducing feature group complexity, (2) Value Model Enhancement, where it discovers high-order interaction terms that yield substantial engagement gains in online experiments, and (3) Notification Behavior Prediction, where it selects compact, high-value feature subsets that improve both model accuracy and inference efficiency. Together, these results demonstrate the practicality and effectiveness of LLM-based reasoning for feature selection in real production systems.
翻译:特征选择是大型工业机器学习系统中的关键环节,直接影响模型精度、效率与可维护性。传统特征选择方法依赖标注数据和统计启发式策略,在标注数据有限且需满足多重运行约束的生产环境中难以应用。为解决该问题,我们提出模型特征代理框架,这是一种利用语义与量化特征信息,通过有序推理进行特征选择的模型驱动框架。该框架将特征定义、重要性评分、相关性及元数据(如特征组或类型)融入结构化提示词,通过可解释的约束感知推理选择特征。我们在三个真实工业应用场景中评估该框架:(1)真实兴趣与时效性预测任务中,它在提升精度的同时降低了特征组复杂度;(2)价值模型增强任务中,它发现的高阶交互项在在线实验中带来显著参与度提升;(3)通知行为预测任务中,它筛选出精简的高价值特征子集,同时提升模型精度与推理效率。这些结果共同证明了基于大模型推理在真实生产系统中进行特征选择的实用性与有效性。