Feature selection (FS) remains essential for building accurate and interpretable detection models, particularly in high-dimensional malware datasets. Conventional FS methods such as Extra Trees, Variance Threshold, Tree-based models, Chi-Squared tests, ANOVA, Random Selection, and Sequential Attention rely primarily on statistical heuristics or model-driven importance scores, often overlooking the semantic context of features. Motivated by recent progress in LLM-driven FS, we investigate whether large language models (LLMs) can guide feature selection in a zero-shot setting, using only feature names and task descriptions, as a viable alternative to traditional approaches. We evaluate multiple LLMs (GPT-5.0, GPT-4.0, Gemini-2.5 etc.) on the EMBOD dataset (a fusion of EMBER and BODMAS benchmark datasets), comparing them against established FS methods across several classifiers, including Random Forest, Extra Trees, MLP, and KNN. Performance is assessed using accuracy, precision, recall, F1, AUC, MCC, and runtime. Our results demonstrate that LLM-guided zero-shot feature selection achieves competitive performance with traditional FS methods while offering additional advantages in interpretability, stability, and reduced dependence on labeled data. These findings position zero-shot LLM-based FS as a promising alternative strategy for effective and interpretable malware detection, paving the way for knowledge-guided feature selection in security-critical applications
翻译:特征选择对于构建准确且可解释的检测模型至关重要,尤其是在高维恶意软件数据集中。传统的特征选择方法(如Extra Trees、方差阈值、基于树的模型、卡方检验、ANOVA、随机选择及序列注意力)主要依赖统计启发式或模型驱动的重要性评分,往往忽略特征的语义上下文。受近期大语言模型驱动特征选择进展的启发,本研究探讨大语言模型是否能在零样本设置下,仅利用特征名称和任务描述来指导特征选择,作为传统方法的可行替代方案。我们在EMBOD数据集(EMBER与BODMAS基准数据集的融合)上评估了多种大语言模型(GPT-5.0、GPT-4.0、Gemini-2.5等),并将其与多种分类器(包括随机森林、Extra Trees、MLP和KNN)上的经典特征选择方法进行比较。性能评估指标涵盖准确率、精确率、召回率、F1分数、AUC、MCC及运行时间。实验结果表明,大语言模型引导的零样本特征选择在达到与传统特征选择方法相当性能的同时,在可解释性、稳定性以及降低对标注数据的依赖性方面展现出额外优势。这些发现确立了基于零样本大语言模型的特征选择作为一种有前景的替代策略,可用于高效且可解释的恶意软件检测,为安全关键应用中知识引导的特征选择开辟了新路径。